scispace - formally typeset
Search or ask a question

Showing papers on "Spectrogram published in 2010"


Journal ArticleDOI
TL;DR: In this article, a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals, is considered.
Abstract: We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. We work in the short-time Fourier transform (STFT) domain, where convolution is routinely approximated as linear instantaneous mixing in each frequency band. Each source STFT is given a model inspired from nonnegative matrix factorization (NMF) with the Itakura-Saito divergence, which underlies a statistical model of superimposed Gaussian components. We address estimation of the mixing and source parameters using two methods. The first one consists of maximizing the exact joint likelihood of the multichannel data using an expectation-maximization (EM) algorithm. The second method consists of maximizing the sum of individual likelihoods of all channels using a multiplicative update algorithm inspired from NMF methodology. Our decomposition algorithms are applied to stereo audio source separation in various settings, covering blind and supervised separation, music and speech sources, synthetic instantaneous and convolutive mixtures, as well as professionally produced music recordings. Our EM method produces competitive results with respect to state-of-the-art as illustrated on two tasks from the international Signal Separation Evaluation Campaign (SiSEC 2008).

636 citations


Proceedings Article
01 Sep 2010
TL;DR: This paper reports the recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms and shows that the binary codes learned produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech.
Abstract: This paper reports our recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms. The top layer of the generative model learns binary codes that can be used for efficient compression of speech and could also be used for scalable speech recognition or rapid speech content retrieval. Each layer of the generative model is fully connected to the layer below and the weights on these connections are pretrained efficiently by using the contrastive divergence approximation to the log likelihood gradient. After layer-bylayer pre-training we “unroll” the generative model to form a deep auto-encoder, whose parameters are then fine-tuned using back-propagation. To reconstruct the full-length speech spectrogram, individual spectrogram segments predicted by their respective binary codes are combined using an overlapand-add method. Experimental results on speech spectrogram coding demonstrate that the binary codes produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech. Index Terms: deep learning, speech feature extraction, neural networks, auto-encoder, binary codes, Boltzmann machine

372 citations


Journal ArticleDOI
TL;DR: This paper describes a model-based expectation-maximization source separation and localization system for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording, and creates probabilistic spectrogram masks that can be used for source separation.
Abstract: This paper describes a system, referred to as model-based expectation-maximization source separation and localization (MESSL), for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording. By clustering individual spectrogram points based on their interaural phase and level differences, MESSL generates masks that can be used to isolate individual sound sources. We first describe a probabilistic model of interaural parameters that can be evaluated at individual spectrogram points. By creating a mixture of these models over sources and delays, the multi-source localization problem is reduced to a collection of single source problems. We derive an expectation-maximization algorithm for computing the maximum-likelihood parameters of this mixture model, and show that these parameters correspond well with interaural parameters measured in isolation. As a byproduct of fitting this mixture model, the algorithm creates probabilistic spectrogram masks that can be used for source separation. In simulated anechoic and reverberant environments, separations using MESSL produced on average a signal-to-distortion ratio 1.6 dB greater and perceptual evaluation of speech quality (PESQ) results 0.27 mean opinion score units greater than four comparable algorithms.

317 citations


01 Jan 2010
TL;DR: In this paper, median filtering is used to separate the harmonic and percussive parts of a monaural audio signal, and the two resulting median filtered spectrograms are then used to generate masks which are then applied to the original spectrogram.
Abstract: In this paper, we present a fast, simple and effective method to separate the harmonic and percussive parts of a monaural audio signal. The technique involves the use of median filtering on a spectrogram of the audio signal, with median filtering performed across successive frames to suppress percussive events and enhance harmonic components, while median filtering is also performed across frequency bins to enhance percussive events and supress harmonic components. The two resulting median filtered spectrograms are then used to generate masks which are then applied to the original spectrogram to separate the harmonic and percussive parts of the signal. We illustrate the use of the algorithm in the context of remixing audio material from commercial recordings.

240 citations


Proceedings Article
21 Jun 2010
TL;DR: This work develops Gamma Process Nonnegative Matrix Factorization (GaP-NMF), a Bayesian nonparametric approach to decomposing spectrograms and derives a mean-field variational inference algorithm and evaluates GaP- NMF on both synthetic data and recorded music.
Abstract: Recent research in machine learning has focused on breaking audio spectrograms into separate sources of sound using latent variable decompositions. These methods require that the number of sources be specified in advance, which is not always possible. To address this problem, we develop Gamma Process Nonnegative Matrix Factorization (GaP-NMF), a Bayesian nonparametric approach to decomposing spectrograms. The assumptions behind GaP-NMF are based on research in signal processing regarding the expected distributions of spectrogram data, and GaP-NMF automatically discovers the number of latent sources. We derive a mean-field variational inference algorithm and evaluate GaP-NMF on both synthetic data and recorded music.

160 citations


01 Jan 2010
TL;DR: In this paper, the authors present recent theoretical and experimental developments on the application to signal reconstruction from a modified magnitude spectrogram of the constraints that an array of complex numbers must verify to be a consistent short-time Fourier transform (STFT) spectrogram.
Abstract: The modification of magnitude spectrograms is at the core of many audio signal processing methods, from source separation to sound modification or noise canceling, and reconstructing a natural sounding signal in such situations is thus a very important issue. This article presents recent theoretical and experimental developments on the application to signal reconstruction from a modified magnitude spectrogram of the constraints that an array of complex numbers must verify to be a consistent short-time Fourier transform (STFT) spectrogram, i.e., to be the STFT spectrogram of an actual real-valued signal. We give here further theoretical insights, present several potential variations on our previously introduced algorithm, investigate various techniques to speed up the signal reconstruction process, and present a thorough experimental comparison of the performance of all the considered algorithms.

80 citations


Journal ArticleDOI
TL;DR: An extensive survey and an algorithm taxonomy is presented and each algorithm is reviewed according to a set of criteria relating to their success in application, concluding that none of these algorithms fully meets these criteria.

76 citations


Journal ArticleDOI
TL;DR: A novel approach based on sparse linear regression (SLR) is developed, formulated as one of under-determined linear regression with a dual sparsity penalty, and its exact solution is obtained using the alternating direction method of multipliers (ADMoM).
Abstract: Frequency hopping (FH) signals have well-documented merits for commercial and military applications due to their near-far resistance and robustness to jamming. Estimating FH signal parameters (e.g., hopping instants, carriers, and amplitudes) is an important and challenging task, but optimum estimation incurs an unrealistic computational burden. The spectrogram has long been the starting non-parametric estimator in this context, followed by line spectra refinements. The problem is that hop timing estimates derived from the spectrogram are coarse and unreliable, thus severely limiting performance. A novel approach is developed in this paper, based on sparse linear regression (SLR). Using a dense frequency grid, the problem is formulated as one of under-determined linear regression with a dual sparsity penalty, and its exact solution is obtained using the alternating direction method of multipliers (ADMoM). The SLR-based approach is further broadened to encompass polynomial-phase hopping (PPH) signals, encountered in chirp spread spectrum modulation. Simulations demonstrate that the developed estimator outperforms spectrogram-based alternatives, especially with regard to hop timing estimation, which is the crux of the problem.

75 citations


Journal ArticleDOI
TL;DR: In this paper, the temporal and spectral structure (spectrogram) of a complex light pulse exploiting the ultrafast switching character of a nonthermal photoinduced phase transition is demonstrated. But the method is limited to femtosecond near-infrared laser pulses.
Abstract: In this letter we demonstrate the possibility to determine the temporal and spectral structure (spectrogram) of a complex light pulse exploiting the ultrafast switching character of a nonthermal photoinduced phase transition. As a proof, we use a VO2 multifilm, undergoing an ultrafast insulator-to-metal phase transition when excited by femtosecond near-infrared laser pulses. The abrupt variation in the multifilm optical properties, over a broad infrared/visible frequency range, is exploited to determine, in situ and in a simple way, the spectrogram of a supercontinuum pulse produced by a photonic crystal fiber. The determination of the structure of the pulse is mandatory to develop pump-probe experiments with frequency resolution over a broad spectral range (700–1100 nm).

66 citations


Book ChapterDOI
21 Jun 2010
TL;DR: It is shown that the statistical source models implied by the nonnegative tensor factorization of multichannel spectrograms under PARAFAC structure implicitly assumes a nonpoint-source model contrasting with usual BSS assumptions and the links between the measure of fit chosen for the NTF and the implied statistical distribution of the sources are clarified.
Abstract: Nonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by this approach. We show that it implicitly assumes a nonpoint-source model contrasting with usual BSS assumptions and we clarify the links between the measure of fit chosen for the NTF and the implied statistical distribution of the sources. While the original approach of Fitzgeral et al requires a posterior clustering of the spatial cues to group the NTF components into sources, we discuss means of performing the clustering within the factorization. In the results section we test the impact of the simplifying nonpoint-source assumption on underdetermined linear instantaneous mixtures of musical sources and discuss the limits of the approach for such mixtures.

60 citations


Journal ArticleDOI
TL;DR: It is shown that SCS is highly robust to noise uncertainty, whereas many other spectrum sensors are not, and improves by 3 dB for the same dwell time, which is a very significant improvement for this application.
Abstract: This paper proposes a novel, highly effective spectrum sensing algorithm for cognitive radio and white space applications. The proposed spectral covariance sensing (SCS) algorithm exploits the different statistical correlations of the received signal and noise in the frequency domain. Test statistics are computed from the covariance matrix of a partial spectrogram and compared with a decision threshold to determine whether a primary signal or arbitrary type is present or not. This detector is analyzed theoretically and verified through realistic open-source simulations using actual digital television signals captured in the US. Compared to the state of the art in the literature, SCS improves sensitivity by 3 dB for the same dwell time, which is a very significant improvement for this application. Further, it is shown that SCS is highly robust to noise uncertainty, whereas many other spectrum sensors are not.

Proceedings ArticleDOI
14 Mar 2010
TL;DR: An extension of non-negative matrix factorization where the temporal activations become frequency dependent and follow a time-varying autoregressive moving average (ARMA) modeling leads to an efficient single-atom decomposition for a single audio event with strong spectral variation (but with constant pitch).
Abstract: Real world sounds often exhibit non-stationary spectral characteristics such as those produced by a harpsichord or a guitar. The classical Non-negative Matrix Factorization (NMF) needs a number of atoms to accurately decompose the spectrogram of such sounds. An extension of NMF is proposed hereafter which includes time-frequency activations based on ARMA modeling. This leads to an efficient single-atom decomposition for a single audio event. The new algorithm is tested on real audio data and shows promising results.

Book ChapterDOI
27 Sep 2010
TL;DR: In this paper, a sparse representation for polyphonic music signals is presented, which is an extension of nonnegative matrix factorization (NMF) for learning the time-varying spectral patterns of musical instruments, such as attack of the piano or vibrato of the violin, without any prior information.
Abstract: This paper presents a new sparse representation for polyphonic music signals. The goal is to learn the time-varying spectral patterns of musical instruments, such as attack of the piano or vibrato of the violin in polyphonic music signals without any prior information. We model the spectrogram of music signals under the assumption that they are composed of a limited number of components which are composed of Markov-chained spectral patterns. The proposed model is an extension of nonnegative matrix factorization (NMF). An efficient algorithm is derived based on the auxiliary function method.

Book ChapterDOI
27 Sep 2010
TL;DR: In this article, the authors generalize the concept of Wiener filtering to time-frequency masks which can involve manipulation of the phase as well by formulating the problem as a consistency-constrained Maximum-Likelihood one.
Abstract: Wiener filtering is one of the most widely used methods in audio source separation. It is often applied on time-frequency representations of signals, such as the short-time Fourier transform (STFT), to exploit their short-term stationarity, but so far the design of the Wiener time-frequency mask did not take into account the necessity for the output spectrograms to be consistent, i.e., to correspond to the STFT of a time-domain signal. In this paper, we generalize the concept of Wiener filtering to time-frequency masks which can involve manipulation of the phase as well by formulating the problem as a consistency-constrained Maximum-Likelihood one. We present two methods to solve the problem, one looking for the optimal time-domain signal, the other promoting consistency through a penalty function directly in the time-frequency domain. We show through experimental evaluation that, both in oracle conditions and combined with spectral subtraction, our method outperforms classical Wiener filtering.

Journal ArticleDOI
TL;DR: The results have clearly indicated that there is reduction in Power line noise in the ECG signal changes according to filter, and the best result is shown by adaptive filter.
Abstract: Over the years Computer aided analysis of ECG signal is gaining with tremendous amount of work being carried out all over the world. This paper is a small step on our part in that direction, ECG Electrocardiogram signal most comely known recognized and used biomedical signal, the ECG signal is very sensitive in nature, and even if small noise mixed with original signal the various characteristics of the signal changes, Data corrupted with noise must either filtered or discarded, filtering is important issue for design consideration of real time heart monitoring systems. The purpose of this paper is to quantify relative performance analysis of different filtering methods for power line interface reduction. The data base for the performance analysis is created by simulation of ECG signal , an ideal ECG signal is best for performance analysis, then data base is corrupted with 50 Hz power line interface ,the ability of different filter (use IIR Notch , Wiener, adaptive filter) are checked by changes in filtered signal, signal to noise ratio, Power of the signal, Power spectral density ,spectrogram of the signal , The location of peaks and its amplitude also measured by Pan Tompkins algorithm for performance analysis of filters. The results have clearly indicated that there is reduction in Power line noise in the ECG signal changes according to filter, and the best result is shown by adaptive filter we can see it easily in spectrogram, The results have been concluded using Mat lab and Simulated ECG database.

Proceedings ArticleDOI
14 Mar 2010
TL;DR: A new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar that is able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs.
Abstract: It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker's face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker - an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talker's face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising - we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs.

Journal ArticleDOI
26 Apr 2010-Ethology
TL;DR: In an attempt to minimize observer bias, numerical taxonomy methods were used to describe and classify humpback whale sounds to make studies of animal communication performed by different researchers or on different species more easily comparable.
Abstract: In an attempt to minimize observer bias, numerical taxonomy methods were used to describe and classify humpback whale sounds. The spectrograms (N = 1255) were digitized into a 16 × 21 binary matrix. The rows were 16 frequencies selected on a logarithmic scale (0.12–8 kHz). The columns were 21 time samples taken every 0.1 s. Each point of the matrix was coded 1 if it lay over part of the sound. Other binary variables were added to code for relative intensity within a sound, frequency modulation and amplitude modulation. The sounds were then compared using the Jaccard similarity coefficient for binary data, and classified with average linkage cluster analysis. This technique produced 115 clusters, which were compared with my aural and visual impressions of the sounds. I agreed with most major categories identified by cluster analysis, but many small clusters had to be fused to other categories. This was partially due to the technique used, and to the complexity of the repertoire under study. Improvements are proposed to further reduce observer bias in classification of sounds, and thus make studies of animal communication performed by different researchers or on different species more easily comparable.

Book ChapterDOI
01 Jan 2010
TL;DR: A simple and fast method to separate a monaural audio signal into harmonic and percussive components, which leads to a useful pre-processing for MIR-related tasks and the application of the proposed technique to automatic chord recognition and rhythm-pattern extraction.
Abstract: In this chapter, we present a simple and fast method to separate a monaural audio signal into harmonic and percussive components, which leads to a useful pre-processing for MIR-related tasks. Exploiting the anisotropies of the power spectrograms of harmonic and percussive components, we define objective functions based on spectrogram gradients, and, applying to them the auxiliary function approach, we derive simple and fast update equations which guarantee the decrease of the objective function at each iteration. We show experimental results for sound separation on popular and jazz music pieces, and also present the application of the proposed technique to automatic chord recognition and rhythm-pattern extraction.

Journal ArticleDOI
TL;DR: This approach combines a preprocessing based on functional principles of the human auditory system and a probabilistic tracking scheme with an algorithm for adaptive frequency range segmentation as well as Bayesian smoothing to derive an efficient framework for estimating formant trajectories.
Abstract: We present a framework for estimating formant trajectories. Its focus is to achieve high robustness in noisy environments. Our approach combines a preprocessing based on functional principles of the human auditory system and a probabilistic tracking scheme. For enhancing the formant structure in spectrograms we use a Gammatone filterbank, a spectral preemphasis, as well as a spectral filtering using difference-of-Gaussians (DoG) operators. Finally, a contrast enhancement mimicking a competition between filter responses is applied. The probabilistic tracking scheme adopts the mixture modeling technique for estimating the joint distribution of formants. In conjunction with an algorithm for adaptive frequency range segmentation as well as Bayesian smoothing an efficient framework for estimating formant trajectories is derived. Comprehensive evaluations of our method on the VTR-formant database emphasize its high precision and robustness. We obtained superior performance compared to existing approaches for clean as well as echoic noisy speech. Finally, an implementation of the framework within the scope of an online system using instantaneous feature-based resynthesis demonstrates its applicability to real-world scenarios.

Proceedings ArticleDOI
01 Dec 2010
TL;DR: The results showed that the proposed feature extraction using Gaussian mixtures of EEG spectrogram yielded better classification results using the KNN classifier.
Abstract: This paper presents the classification of EEG correlates on emotion using features extracted by Gaussian mixtures of EEG spectrogram. This method is compared with three feature extraction methods based on fractal dimension of EEG signal including Higuchi, Minkowski Bouligand, and Fractional Brownian motion. The K nearest neighbor and Support Vector Machine are applied to classify extracted features. The 4 emotional states investigated in this paper are defined using the valence-arousal plane: two valence states (positive and negative) and two arousal states (calm, excited). The accuracy of system to classify 4 emotional states is investigated on EEG collected from 26 subjects (20 to 32 years old) while exposed to emotionally-related visual and audio stimuli. The results showed that the proposed feature extraction using Gaussian mixtures of EEG spectrogram yielded better classification results using the KNN classifier.

Journal ArticleDOI
TL;DR: The model developed characterizes key aspects of the acoustic signal that influence sexual selection while alleviating the need to extract higher‐level signal traits a priori.
Abstract: A major goal of evolutionary biology is to understand the dynamics of natural selection within populations. The strength and direction of selection can be described by regressing relative fitness measurements on organismal traits of ecological significance. However, many important evolutionary characteristics of organisms are complex, and have correspondingly complex relationships to fitness. Secondary sexual characteristics such as mating displays are prime examples of complex traits with important consequences for reproductive success. Typically, researchers atomize sexual traits such as mating signals into a set of measurements including pitch and duration, in order to include them in a statistical analysis. However, these researcher-defined measurements are unlikely to capture all of the relevant phenotypic variation, especially when the sources of selection are incompletely known. In order to accommodate this complexity we propose a Bayesian dimension-reduced spectrogram generalized linear model that directly incorporates representations of the entire phenotype (one-dimensional acoustic signal) into the model as a predictor while accounting for multiple sources of uncertainty. The first stage of dimension reduction is achieved by treating the spectrogram as an "image" and finding its corresponding empirical orthogonal functions. Subsequently, further dimension reduction is accomplished through model selection using stochastic search variable selection. Thus, the model we develop characterizes key aspects of the acoustic signal that influence sexual selection while alleviating the need to extract higher-level signal traits a priori. This facet of our approach is fundamental and has the potential to provide additional biological insight, as is illustrated in our analysis.

Proceedings ArticleDOI
TL;DR: The method of identification is based on the analysis of spectrum dynamics of medium response and has the ability not only to detect the presence of the substance in the sample but to identify it by its 2D signature, which is unique for each investigated substance.
Abstract: The method, which gives us a possibility to obtain the unique 2D signature of substance, for its identification in THz frequency range is developed and applied for the treatment of signals, passed through ordinary materials or selected explosives, including those hidden under opaque simulant covers. The method of identification is based on the analysis of spectrum dynamics (spectrogram) of medium response and has the ability not only to detect the presence of the substance in the sample but to identify it by its 2D signature, which is unique for each investigated substance. It allows to trace the dynamics of many spectral lines in one set of measurements simultaneously and to obtain the full information about the spectrum dynamics of the measured signal. We showed that spectrograms of THz pulses, passed through the explosives, hidden under simulant covers, widely differ from spectrograms of simulant themselves despite of a little difference in their Fourier spectra. Therefore, the method allows detecting and identifying the hidden substances with high probability and can be very effective for defense and security applications. The problem of detection of a noisy regular acoustic signal with linear modulation of frequency is examined too.

22 Jun 2010
TL;DR: In this article, a new approach is proposed and exploited for complex, multistage gearboxes with planetary stage, to extract information related to cyclic load variation, an instantaneous speed obtained via time-frequency spectrogram will be used.
Abstract: Condition monitoring of gearboxes via vibration analysis is well-recognized approach in scientific literature and also in engineering practice. However, in many cases machine works under non-stationary operating conditions (load and speed variation), that often requires special signal processing and pattern recognition suitable for time varying systems. One of key problem is to identify variation of external load or speed. Measurement of current consumed by electric motor or instantaneous speed obtained by processing of tachometer signal, in many practical situations (industrial condition) may be difficult or impossible. In such case non-stationary load variation may be identified by extraction of information hidden in vibration signal. For example it may be extracted from amplitude or frequency demodulation. Unfortunately both approaches are difficult (or even impossible) for our machines due to complexity of design and wide range of load/speed variation. In order to avoid these constrains in this paper new approach will be proposed and exploited for complex, multistage gearboxes with planetary stage. To extract information related to cyclic load variation, an instantaneous speed obtained via time-frequency spectrogram will be used. Algorithms for Instantaneous Frequency (IF) estimation via T-F maps have been initially developed by Millioz and Martin. In this paper a novel procedure for Instantaneous Speed estimation (based on IF identification by mentioned automatic algorithm) will be proposed, next the procedure will be applied to vibration signals from planetary gearboxes.

Proceedings ArticleDOI
14 Mar 2010
TL;DR: Experimental evaluations show that the proposed algorithm is able to greatly reduce the reverberation effects in even highly reverberant signals captured in auditoria and other open spaces.
Abstract: We present an algorithm to dereverberate single- and multi-channel audio recordings. The proposed algorithm models the magnitude spectrograms of clean audio signals as histograms drawn from a multinomial process. Spectrograms of reverberated signals are obtained as histograms of draws from the PDF of the sum of two random variables, one representing the spectrogram of clean speech and the second the frequency decomposition of the room response. The spectrogram of the clean signal is computed as a maximum-likelihood estimate from the spectrogram of reverberant speech using an EM algorithm. Experimental evaluations show that the proposed algorithm is able to greatly reduce the reverberation effects in even highly reverberant signals captured in auditoria and other open spaces.

Journal ArticleDOI
TL;DR: A vector Brillouin optical time-domain analyzer that has a high immunity level to noise, and it features a phase spectrogram capability, well suited for complex situations involving several acoustic resonances, such as high-order longitudinal modes.
Abstract: Thanks to a double-frequency phase modulation scheme, we report a vector Brillouin optical time-domain analyzer (BOTDA). This BOTDA has a high immunity level to noise, and it features a phase spectrogram capability. It is well suited for complex situations involving several acoustic resonances, such as high-order longitudinal modes. It has notably been used to characterize a dispersion-shifted fiber, allowing us to report spectrograms with multiple acoustic resonances. A very high 57dB dynamic range is also reported for 100-ns-long pulses simultaneously with a 16cm numerical resolution.


Journal ArticleDOI
TL;DR: A novel coherent spectrogram redistribution method, coherent single range Doppler interferometry (CSRDI), is proposed, which is capable of generating high-resolution imagery by applying a phase matched processing and performs well at low signal-to-noise ratio.
Abstract: This study focuses on the narrow-band radar imaging for high-speed spinning targets. Based on the time-frequency characteristic of the echoed signal, a novel coherent spectrogram redistribution method, coherent single range Doppler interferometry (CSRDI), is proposed, which is capable of generating high-resolution imagery by applying a phase matched processing. Furthermore, the approach performs well at low signal-to-noise ratio. The spinning rate error is also taken into consideration and an estimation approach based on the focal entropy is proposed. The validity is confirmed by real data and numerical simulations.

Journal ArticleDOI
TL;DR: An algorithm based on a short-term representation of the fractional Fourier transform which is highly suited to signals that contain multiple non-stationary components, including a synthetic signal and a bat echolocation signal is presented.

Journal ArticleDOI
TL;DR: The short-time Fourier transform (STFT) was employed in ECG filtering stage and the narrow rectangular window was used to transform ECG signals into time-frequency domain for QRS complex detection.
Abstract: This paper reports our study in QRS complex detection. The short-time Fourier transform (STFT) was employed in ECG filtering stage. The narrow rectangular window was used to transform ECG signals into time-frequency domain. The temporal information at 45 Hz from spectrogram was analyzed for detecting QRS locations. The automated thresholding combined with local maxima finding method was modified to find the QRS location. The data used in this study is MIT-BIH Arrhythmia database. As the results, our proposed technique achieved the detection rate better than 99% and fail ratio was 1.3%.

Proceedings Article
26 Sep 2010
TL;DR: A novel method is presented that achieves high resolution simultaneously in both time and frequency, the “super-resolution spectrogram”, which can be particularly useful for speech as it can simultaneously resolve both glottal pulses and individual harmonics.
Abstract: The short-time Fourier transform (STFT) based spectrogram is commonly used to analyze the time-frequency content of a signal. Depending on window size, the STFT provides a trade-off between time and frequency resolutions. This paper presents a novel method that achieves high resolution simultaneously in both time and frequency. We extend Probabilistic Latent Component Analysis (PLCA) to jointly decompose two spectrograms, one with a high time resolution and one with a high frequency resolution. Using this decomposition, a new spectrogram, maintaining high resolution in both time and frequency, is constructed. Termed the “super-resolution spectrogram”, it can be particularly useful for speech as it can simultaneously resolve both glottal pulses and individual harmonics.