Topic
Spectrogram
About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.
Papers published on a yearly basis
Papers
More filters
••
14 May 2006TL;DR: This feature describes the amplitude modulation spectrum of each subband, and results in a single feature vector per utterance that is directly used as the speaker's modulation frequency template, excluding the need for a separate training phase.
Abstract: We propose a method for computing joint acoustic-modulation frequency feature for speaker recognition. This feature describes the amplitude modulation spectrum of each subband, and results in a single feature vector per utterance. This vector is directly used as the speaker's modulation frequency template, excluding the need for a separate training phase. The effects of analysis parameters and pattern matching are studied using the NIST 2001 corpus. When fusing the proposed feature with the baseline MFCC/GMM system, EER is reduced from 18.2% to 16.7%.
41 citations
••
TL;DR: Using the learned mappings in the generalized cross-correlation framework, improved localization performance is demonstrated and the resulting mappings exhibit behavior consistent with the well-known precedence effect from psychoacoustic studies.
Abstract: Speech source localization in reverberant environments has proved difficult for automated microphone array systems. Because of its nonstationary nature, certain features observable in the reverberant speech signal, such as sudden increases in audio energy, provide cues to indicate time-frequency regions that are particularly useful for audio localization. We exploit these cues by learning a mapping from reverberated signal spectrograms to localization precision using ridge regression. Using the learned mappings in the generalized cross-correlation framework, we demonstrate improved localization performance. Additionally, the resulting mappings exhibit behavior consistent with the well-known precedence effect from psychoacoustic studies
41 citations
••
TL;DR: Speech reconstruction tests reveal that the combination of robust fundamental frequency and voicing estimation with spectral subtraction in the integrated front-end leads to intelligible and relatively noise-free speech.
41 citations
••
TL;DR: Wang et al. as mentioned in this paper proposed a multiscale deep convolutional long short-term memory (LSTM) framework for spontaneous speech emotion recognition, where a deep CNN model was used to learn segment-level features on the basis of the created image-like three channels of spectrograms.
Abstract: Recently, emotion recognition in real sceneries such as in the wild has attracted extensive attention in affective computing, because existing spontaneous emotions in real sceneries are more challenging and difficult to identify than other emotions. Motivated by the diverse effects of different lengths of audio spectrograms on emotion identification, this paper proposes a multiscale deep convolutional long short-term memory (LSTM) framework for spontaneous speech emotion recognition. Initially, a deep convolutional neural network (CNN) model is used to learn deep segment-level features on the basis of the created image-like three channels of spectrograms. Then, a deep LSTM model is adopted on the basis of the learned segment-level CNN features to capture the temporal dependency among all divided segments in an utterance for utterance-level emotion recognition. Finally, different emotion recognition results, obtained by combining CNN with LSTM at multiple lengths of segment-level spectrograms, are integrated by using a score-level fusion strategy. Experimental results on two challenging spontaneous emotional datasets, i.e., the AFEW5.0 and BAUM-1s databases, demonstrate the promising performance of the proposed method, outperforming state-of-the-art methods.
41 citations
•
26 May 2011
TL;DR: In this article, the Fourier power spectrum and spectrogram were assigned to the speech spectrum and the wavelet representation of the spectrograms and power spectra were used to estimate the speech signal.
Abstract: Introduction.- Historical perspective on speech spectrum analysis.-The Fourier power spectrum and spectrogram.- Other time-frequency and wavelet representations.- The new frontier: Reassigned spectrograms and power spectra.- Linear prediction of the speech spectrum.- Homomorphic analysis and the cepstrum.- Formant tracking methods.
41 citations