scispace - formally typeset
Search or ask a question
Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.


Papers
More filters
Posted Content
TL;DR: The proposed Parallel WaveGAN has only 1.44 M parameters and can generate 24 kHz speech waveform 28.68 times faster than real-time on a single GPU environment, which is comparative to the best distillation-based Parallel WaveNet system.
Abstract: We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network. In the proposed method, a non-autoregressive WaveNet is trained by jointly optimizing multi-resolution spectrogram and adversarial loss functions, which can effectively capture the time-frequency distribution of the realistic speech waveform. As our method does not require density distillation used in the conventional teacher-student framework, the entire model can be easily trained. Furthermore, our model is able to generate high-fidelity speech even with its compact architecture. In particular, the proposed Parallel WaveGAN has only 1.44 M parameters and can generate 24 kHz speech waveform 28.68 times faster than real-time on a single GPU environment. Perceptual listening test results verify that our proposed method achieves 4.16 mean opinion score within a Transformer-based text-to-speech framework, which is comparative to the best distillation-based Parallel WaveNet system.

256 citations

Journal ArticleDOI
TL;DR: The utility of using TFRs to quantitatively resolve changes in the frequency content of these nonstationary signals, as a function of time, is illustrated.
Abstract: The objective of this study is to establish the effectiveness of four different time-frequency representations (TFRs)—the reassigned spectrogram, the reassigned scalogram, the smoothed Wigner–Ville distribution, and the Hilbert spectrum—by comparing their ability to resolve the dispersion relationships for Lamb waves generated and detected with optical techniques This paper illustrates the utility of using TFRs to quantitatively resolve changes in the frequency content of these nonstationary signals, as a function of time While each technique has certain strengths and weaknesses, the reassigned spectrogram appears to be the best choice to characterize multimode Lamb waves

253 citations

Journal ArticleDOI
TL;DR: A regularization scheme is introduced that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.
Abstract: We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being invariant to confounding low level details in the signal such as the underlying pitch contour or background noise. Since the learned representation is tuned to contain only phonetic content, we resort to using a high capacity WaveNet decoder to infer information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the kind of constraint that is applied to the latent representation. We compare three variants: a simple dimensionality reduction bottleneck, a Gaussian Variational Autoencoder (VAE), and a discrete Vector Quantized VAE (VQ-VAE). We analyze the quality of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately reconstruct individual spectrogram frames. Moreover, for discrete encodings extracted using the VQ-VAE, we measure the ease of mapping them to phonemes. We introduce a regularization scheme that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.

252 citations

Journal ArticleDOI
TL;DR: A new architecture is introduced, which extracts mel-frequency cepstral coefficients, chromagram, mel-scale spectrogram, Tonnetz representation, and spectral contrast features from sound files and uses them as inputs for the one-dimensional Convolutional Neural Network for the identification of emotions using samples from the Ryerson Audio-Visual Database of Emotional Speech and Song, Berlin, and EMO-DB datasets.

251 citations

Journal ArticleDOI
TL;DR: Spectrogram correlation can be used for classification as well as for estimation and detection, and for maximum likelihood parameter estimation, e.g., estimation of delay or center frequency of a signal.
Abstract: A locally optimum detector correlates the data spectrogram with a reference spectrogram in order to detect (i) a known signal with unknown delay and Doppler parameters, (ii) a random signal with known covariance function, or (iii) the output of a random, time‐varying channel with known scattering function. Spectrogram correlation can also be used for maximum likelihood parameter estimation, e.g., estimation of delay or center frequency of a signal. To estimate an analog input signal from its spectrogram, a modified deconvolution operation can be used together with a predictive noise canceler. If no noise is added to the spectrogram, the mean‐square error of this signal estimate is independent of the window function that is used to construct the spectrogram. When estimates of specific signal parameters are obtained directly from the spectrogram, these estimates have mean‐square errors that depend upon both signal and window waveforms. Spectrogram correlation can be used for classification as well as for estimation and detection. Parameter estimators and detectors are, in fact, specialized kinds of classifiers.

248 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Wavelet
78K papers, 1.3M citations
76% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023627
20221,396
2021488
2020595
2019593