scispace - formally typeset
Search or ask a question
Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A deconvolutive short-time Fourier transform (DSTFT) spectrogram method is proposed, which improves the time-frequency resolution and reduces the cross-terms simultaneously by applying a 2-D deconvolution operation on the STFT spectrogram.
Abstract: The short-time Fourier transform (STFT) spectrogram, which is the squared modulus of the STFT, is a smoothed version of the Wigner-Ville distribution (WVD). The STFT spectrogram is 2-D convolution of the the signal WVD and the window function WVD. In this letter, we propose a deconvolutive short-time Fourier transform (DSTFT) spectrogram method, which improves the time-frequency resolution and reduces the cross-terms simultaneously by applying a 2-D deconvolution operation on the STFT spectrogram. Compared to the STFT spectrogram, the spectrogram obtained by the proposed method shows a clear improvement in the time-frequency resolution. Computer simulations are provided to illustrate the good performance of the proposed method, compared with some traditional time-frequency representation (TFR) methods.

76 citations

Proceedings ArticleDOI
20 Aug 2017
TL;DR: The proposed postfilter can be used to reduce the gap between synthesized and target spectra, even in the highdimensional STFT domain, and is applied to a DNN-based speech-synthesis task.
Abstract: We propose a learning-based postfilter to reconstruct the high-fidelity spectral texture in short-term Fourier transform (STFT) spectrograms. In speech-processing systems, such as speech synthesis, voice conversion, and speech enhancement, the STFT spectrograms have been widely used as key acoustic representations. In these tasks, we normally need to precisely generate or predict the representations from inputs; however, generated spectra typically lack the fine structures close to the true data. To overcome these limitations and reconstruct spectra having finer structures, we propose a generative adversarial network (GAN)-based postfilter that is implicitly optimized to match the true feature distribution in adversarial learning. The challenge with this postfilter is that a GAN cannot be easily trained for very high-dimensional data such as the STFT. Therefore, we introduce a divide-and-concatenate strategy. We first divide the spectrograms into multiple frequency bands with overlap, train the GAN-based postfilter for the individual bands, and finally connect the bands with overlap. We applied our proposed postfilter to a DNN-based speech-synthesis task. The results show that our proposed postfilter can be used to reduce the gap between synthesized and target spectra, even in the highdimensional STFT domain.

75 citations

Journal ArticleDOI
TL;DR: In this paper, a multi-head convolutional neural network (MCNN) was proposed for waveform synthesis from spectrograms, with transposed convolution layers in parallel heads.
Abstract: We propose the multi-head convolutional neural network (MCNN) for waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is employed with transposed convolution layers in parallel heads. MCNN enables significantly better utilization of modern multi-core processors than commonly used iterative algorithms like Griffin–Lim, and yields very fast (more than 300 × real time) runtime. For training of MCNN, we use a large-scale speech recognition dataset and losses defined on waveforms that are related to perceptual audio quality. We demonstrate that MCNN constitutes a very promising approach for high-quality speech synthesis, without any iterative algorithms or autoregression in computations.

75 citations

Proceedings ArticleDOI
04 May 2020
TL;DR: The performance of the models are significantly enhanced by the use of log-mel deltas, and overall the approach is capable of training strong single models, without use of any supplementary data from outside the official challenge dataset, with excellent generalization to unknown devices.
Abstract: We investigate the problem of acoustic scene classification, using a deep residual network applied to log-mel spectrograms complemented by log-mel deltas and delta-deltas. We design the network to take into account that the temporal and frequency axes in spectrograms represent fundamentally different information. In particular, we use two pathways in the residual network: one for high frequencies and one for low frequencies, that were fused just two convolutional layers prior to the network output. We conduct experiments using two public 2019 DCASE datasets for acoustic scene classification; the first with binaural audio inputs recorded by a single device, and the second with single-channel audio inputs recorded through various devices. We show the performance of our models are significantly enhanced by the use of log-mel deltas, and that overall our approach is capable of training strong single models, without use of any supplementary data from outside the official challenge dataset, with excellent generalization to unknown devices. In particular, our approach achieved second place in 2019 DCASE Task 1b (0.4% behind the winning entry), and the best Task 1B evaluation results (by a large margin of over 5%) on test data from a device not used to record any training data.

75 citations

Journal ArticleDOI
TL;DR: A novel approach based on sparse linear regression (SLR) is developed, formulated as one of under-determined linear regression with a dual sparsity penalty, and its exact solution is obtained using the alternating direction method of multipliers (ADMoM).
Abstract: Frequency hopping (FH) signals have well-documented merits for commercial and military applications due to their near-far resistance and robustness to jamming. Estimating FH signal parameters (e.g., hopping instants, carriers, and amplitudes) is an important and challenging task, but optimum estimation incurs an unrealistic computational burden. The spectrogram has long been the starting non-parametric estimator in this context, followed by line spectra refinements. The problem is that hop timing estimates derived from the spectrogram are coarse and unreliable, thus severely limiting performance. A novel approach is developed in this paper, based on sparse linear regression (SLR). Using a dense frequency grid, the problem is formulated as one of under-determined linear regression with a dual sparsity penalty, and its exact solution is obtained using the alternating direction method of multipliers (ADMoM). The SLR-based approach is further broadened to encompass polynomial-phase hopping (PPH) signals, encountered in chirp spread spectrum modulation. Simulations demonstrate that the developed estimator outperforms spectrogram-based alternatives, especially with regard to hop timing estimation, which is the crux of the problem.

75 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Wavelet
78K papers, 1.3M citations
76% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023627
20221,396
2021488
2020595
2019593