Topic
Spectrogram
About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: A novel spatiotemporal and frequential cascaded attention network with large-margin learning is proposed that achieves a promising performance in speech emotion recognition.
33 citations
••
TL;DR: FRCNN is proposed to embrace the progress in very deep architecture, feature fusion and convolutional operation, and depth-wise separable convolution is utilized to reduce the number of trainable parameters.
Abstract: Convolutional neural networks with spectrogram feature representation for acoustic scene classification are attracting more and more attentions due to its favorable performance. However, most of the existing methods are still restricted to the tradeoff between the minimum coverage area across time-frequency feature representation, i.e. time-frequency feature resolution, and the depth of CNN models. Thus, it is unfeasible to improve the performance by simply deepening networks. In this paper, fine-resolution convolutional neural network (FRCNN) is proposed to embrace the progress in very deep architecture, feature fusion and convolutional operation. Specifically, lateral construction is applied to generate a fine-resolution feature map with semantic information, and depth-wise separable convolution is utilized to reduce the number of trainable parameters. Extensive experiments demonstrate that the proposed FRCNN exhibits high performance on several metrics, with low computational complexity.
33 citations
••
TL;DR: The HR-NMF model is extended to multichannel signals and to convolutive mixtures, and a fast variational expectation-maximization (EM) algorithm is proposed to estimate the enhanced model.
Abstract: Several probabilistic models involving latent components have been proposed for modeling time-frequency (TF) representations of audio signals such as spectrograms, notably in the nonnegative matrix factorization (NMF) literature. Among them, the recent high-resolution NMF (HR-NMF) model is able to take both phases and local correlations in each frequency band into account, and its potential has been illustrated in applications such as source separation and audio inpainting. In this paper, HR-NMF is extended to multichannel signals and to convolutive mixtures. The new model can represent a variety of stationary and non-stationary signals, including autoregressive moving average (ARMA) processes and mixtures of damped sinusoids. A fast variational expectation-maximization (EM) algorithm is proposed to estimate the enhanced model. This algorithm is applied to piano signals, and proves capable of accurately modeling reverberation, restoring missing observations, and separating pure tones with close frequencies.
33 citations
••
TL;DR: Experimental tests have been conducted, which show that the extraction of the spectral dictionary and temporal codes is more efficient using sparsity learning and subsequently leads to better separation performance.
Abstract: An unsupervised single channel audio separation method from pattern recognition viewpoint is presented. The proposed method does not require training knowledge and the separation system is based on non-uniform time-frequency (TF) analysis and feature extraction. Unlike conventional research that concentrates on the use of spectrogram or its variants, the proposed separation algorithm uses an alternative TF representation based on the gammatone filterbank. In particular, the monaural mixed audio signal is shown to be considerably more separable in this non-uniform TF domain. The analysis of signal separability to verify this finding is provided. In addition, a variational Bayesian approach is derived to learn the sparsity parameters for optimizing the matrix factorization. Experimental tests have been conducted, which show that the extraction of the spectral dictionary and temporal codes is more efficient using sparsity learning and subsequently leads to better separation performance.
33 citations
••
18 Mar 2005TL;DR: A monaural noise suppression algorithm is proposed based on filtering the spectrotemporal modulations of noisy speech based on a multiscale representation of the signal spectrogram generated by a model of sound processing in the auditory system.
Abstract: A monaural noise suppression algorithm is proposed based on filtering the spectrotemporal modulations of noisy speech. The modulations are estimated from a multiscale representation of the signal spectrogram generated by a model of sound processing in the auditory system. A significant advantage of this method is its ability to suppress noise that has distinctive modulation patterns, despite being spectrally overlapping with the speech. The performance of the algorithm is evaluated using subjective and objective tests and compared to the optimal smoothing and minimum statistics approach (Martin (2001)). The results demonstrate the efficacy of the spectrotemporal filtering approach in the conditions examined.
32 citations