scispace - formally typeset
Search or ask a question
Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.


Papers
More filters
Proceedings ArticleDOI
14 May 2006
TL;DR: This feature describes the amplitude modulation spectrum of each subband, and results in a single feature vector per utterance that is directly used as the speaker's modulation frequency template, excluding the need for a separate training phase.
Abstract: We propose a method for computing joint acoustic-modulation frequency feature for speaker recognition. This feature describes the amplitude modulation spectrum of each subband, and results in a single feature vector per utterance. This vector is directly used as the speaker's modulation frequency template, excluding the need for a separate training phase. The effects of analysis parameters and pattern matching are studied using the NIST 2001 corpus. When fusing the proposed feature with the baseline MFCC/GMM system, EER is reduced from 18.2% to 16.7%.

41 citations

Journal ArticleDOI
TL;DR: Using the learned mappings in the generalized cross-correlation framework, improved localization performance is demonstrated and the resulting mappings exhibit behavior consistent with the well-known precedence effect from psychoacoustic studies.
Abstract: Speech source localization in reverberant environments has proved difficult for automated microphone array systems. Because of its nonstationary nature, certain features observable in the reverberant speech signal, such as sudden increases in audio energy, provide cues to indicate time-frequency regions that are particularly useful for audio localization. We exploit these cues by learning a mapping from reverberated signal spectrograms to localization precision using ridge regression. Using the learned mappings in the generalized cross-correlation framework, we demonstrate improved localization performance. Additionally, the resulting mappings exhibit behavior consistent with the well-known precedence effect from psychoacoustic studies

41 citations

Journal ArticleDOI
TL;DR: Speech reconstruction tests reveal that the combination of robust fundamental frequency and voicing estimation with spectral subtraction in the integrated front-end leads to intelligible and relatively noise-free speech.

41 citations

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a multiscale deep convolutional long short-term memory (LSTM) framework for spontaneous speech emotion recognition, where a deep CNN model was used to learn segment-level features on the basis of the created image-like three channels of spectrograms.
Abstract: Recently, emotion recognition in real sceneries such as in the wild has attracted extensive attention in affective computing, because existing spontaneous emotions in real sceneries are more challenging and difficult to identify than other emotions. Motivated by the diverse effects of different lengths of audio spectrograms on emotion identification, this paper proposes a multiscale deep convolutional long short-term memory (LSTM) framework for spontaneous speech emotion recognition. Initially, a deep convolutional neural network (CNN) model is used to learn deep segment-level features on the basis of the created image-like three channels of spectrograms. Then, a deep LSTM model is adopted on the basis of the learned segment-level CNN features to capture the temporal dependency among all divided segments in an utterance for utterance-level emotion recognition. Finally, different emotion recognition results, obtained by combining CNN with LSTM at multiple lengths of segment-level spectrograms, are integrated by using a score-level fusion strategy. Experimental results on two challenging spontaneous emotional datasets, i.e., the AFEW5.0 and BAUM-1s databases, demonstrate the promising performance of the proposed method, outperforming state-of-the-art methods.

41 citations

Book
26 May 2011
TL;DR: In this article, the Fourier power spectrum and spectrogram were assigned to the speech spectrum and the wavelet representation of the spectrograms and power spectra were used to estimate the speech signal.
Abstract: Introduction.- Historical perspective on speech spectrum analysis.-The Fourier power spectrum and spectrogram.- Other time-frequency and wavelet representations.- The new frontier: Reassigned spectrograms and power spectra.- Linear prediction of the speech spectrum.- Homomorphic analysis and the cepstrum.- Formant tracking methods.

41 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Wavelet
78K papers, 1.3M citations
76% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023627
20221,396
2021488
2020595
2019593