scispace - formally typeset
Search or ask a question
Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Simulation results show that the adaptive window zero-crossing-based IF estimation method is superior to fixed window methods and is also better than adaptive spectrogram and adaptive Wigner-Ville distribution (WVD)-based IF estimators for different signal-to-noise ratio (SNR).
Abstract: We address the problem of estimating instantaneous frequency (IF) of a real-valued constant amplitude time-varying sinusoid. Estimation of polynomial IF is formulated using the zero-crossings of the signal. We propose an algorithm to estimate nonpolynomial IF by local approximation using a low-order polynomial, over a short segment of the signal. This involves the choice of window length to minimize the mean square error (MSE). The optimal window length found by directly minimizing the MSE is a function of the higher-order derivatives of the IF which are not available a priori. However, an optimum solution is formulated using an adaptive window technique based on the concept of intersection of confidence intervals. The adaptive algorithm enables minimum MSE-IF (MMSE-IF) estimation without requiring a priori information about the IF. Simulation results show that the adaptive window zero-crossing-based IF estimation method is superior to fixed window methods and is also better than adaptive spectrogram and adaptive Wigner-Ville distribution (WVD)-based IF estimators for different signal-to-noise ratio (SNR).

27 citations

Journal ArticleDOI
TL;DR: This paper describes a method of producing artificial speech from a phonetic input, i.e., symbols representing the names of phonemes corresponding to a given text are fed into a machine and the acoustic waveforms of connected speech emerge.
Abstract: This paper describes a method of producing artificial speech from a phonetic input, i.e., symbols representing the names of phonemes corresponding to a given text are fed into a machine and the acoustic waveforms of connected speech emerge. The experimental work was accomplished on an electronic computer (IBM 7090), but the scheme is simple enough to permit realization with analog hardware. The talking machine program is divided into two parts. The first part simulates a more or less conventional resonance synthesizer of the tandem variety, requiring nine control signals; buzz intensity, hiss intensity, pitch, plus the center frequencies and bandwidths of three formants. Initially, this part of the program was used alone in experiments for which the inputs were detailed specifications of the control signals derived from spectrograms and physiological data, sampled at approximately three times the phonemic rate. Results from this phase were later combined with known results in speech perception to produce ...

27 citations

Journal ArticleDOI
TL;DR: The white-noise characteristics of the AR modelling error signal indicated that the Doppler blood-flow signal can be adequately modelled as a complex AR process and with appropriate model orders, AR modelling provided better doppler spectrogram estimates than the periodogram.
Abstract: Doppler spectrograms obtained by using autoregressive (AR) modelling based on the Yule-Walker equations were investigated. A complex AR model using the in-phase and the quadrature components of the Doppler signal was used to provide blood-flow directions. The effect of model orders on the spectrogram estimation was studied using cardiac Doppler blood flow signals taken from 20 patients. The 'final prediction error' (FPE) and the 'Akaike's information criterion' (AIC) provided almost identical results in model-order selection. An index, the spectral envelope area (SEA), was used to evaluate the effect of window duration and sampling frequency on AR Doppler spectrogram estimation. The statistical analysis revealed that the SEA obtained from AR modelling was not sensitive to window duration and sampling frequency. This result verified the consistency of the AR Doppler spectrogram. The white-noise characteristics of the AR modelling error signal indicated that the Doppler blood-flow signal can be adequately modelled as a complex AR process. With appropriate model orders, AR modelling provided better Doppler spectrogram estimates than the periodogram.

27 citations

Proceedings Article
01 Jan 2012
TL;DR: This study compute the local representation on speech spectrogram as the raw “signal” and use it as the local sparse code to perform a standard phone classification task and demonstrates meaningful acoustic-phonetic properties that are captured by a collection of the dictionary entries.
Abstract: We propose a novel approach to acoustic modeling based on recent advances in sparse representations. The key idea in sparse coding is to compute a compressed local representation of a signal via an over-complete basis or dictionary that is learned in an unsupervised way. In this study, we compute the local representation on speech spectrogram as the raw “signal” and use it as the local sparse code to perform a standard phone classification task. A linear classifier is used that directly receives the coding space for making the classification decision. The simplicity of the linear classifier allows us to assess whether the sparse representations are sufficiently rich to serve as effective acoustic features for discriminating speech classes. Our experiments demonstrate competitive error rates when compared to other shallow approaches. An examination of the dictionary learned in sparse feature extraction demonstrates meaningful acoustic-phonetic properties that are captured by a collection of the dictionary entries.

27 citations

Journal ArticleDOI
TL;DR: An efficient way to directly compute the full-resolution frequency estimates of speech and noise using coupled dictionaries, which results in improved word error rates for the speech recognition tasks using HMM-GMM and deep-neural network (DNN) based systems.
Abstract: Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech. To obtain the decomposition, exemplars sampled in lower dimensional spaces are preferred over the full-resolution frequency domain for their reduced computational complexity and the ability to better generalize to unseen cases. But the resulting filter may be sub-optimal as the mapping of the obtained speech and noise estimates to the full-resolution frequency domain yields a low-rank approximation. This paper proposes an efficient way to directly compute the full-resolution frequency estimates of speech and noise using coupled dictionaries: an input dictionary containing atoms from the desired exemplar space to obtain the decomposition and a coupled output dictionary containing exemplars from the full-resolution frequency domain. We also introduce modulation spectrogram features for the exemplar-based tasks using this approach. The proposed system was evaluated for various choices of input exemplars and yielded improved speech enhancement performances on the AURORA-2 and AURORA-4 databases. We further show that the proposed approach also results in improved word error rates (WERs) for the speech recognition tasks using HMM-GMM and deep-neural network (DNN) based systems.

27 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Wavelet
78K papers, 1.3M citations
76% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023627
20221,396
2021488
2020595
2019593