Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Posted Content•

Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

[...]

Ron Weiss¹, RJ Skerry-Ryan¹, Eric Battenberg¹, Soroosh Mariooryad¹, Diederik P. Kingma¹ - Show less +1 more•Institutions (1)

Google¹

06 Nov 2020-arXiv: Computation and Language

TL;DR: A sequence-to-sequence neural network which directly generates speech waveforms from text inputs, extending the Tacotron model by incorporating a normalizing flow into the autoregressive decoder loop, enabling parallel training and synthesis.

...read moreread less

Abstract: We describe a sequence-to-sequence neural network which directly generates speech waveforms from text inputs. The architecture extends the Tacotron model by incorporating a normalizing flow into the autoregressive decoder loop. Output waveforms are modeled as a sequence of non-overlapping fixed-length blocks, each one containing hundreds of samples. The interdependencies of waveform samples within each block are modeled using the normalizing flow, enabling parallel training and synthesis. Longer-term dependencies are handled autoregressively by conditioning each flow on preceding blocks.This model can be optimized directly with maximum likelihood, with-out using intermediate, hand-designed features nor additional loss terms. Contemporary state-of-the-art text-to-speech (TTS) systems use a cascade of separately learned models: one (such as Tacotron) which generates intermediate features (such as spectrograms) from text, followed by a vocoder (such as WaveRNN) which generates waveform samples from the intermediate features. The proposed system, in contrast, does not use a fixed intermediate representation, and learns all parameters end-to-end. Experiments show that the proposed model generates speech with quality approaching a state-of-the-art neural TTS system, with significantly improved generation speed.

...read moreread less

40 citations

Journal Article•DOI•

Accurate Bearing Fault Diagnosis under Variable Shaft Speed using Convolutional Neural Networks and Vibration Spectrogram

[...]

Minh Tuan Pham, Jong-Myon Kim, Cheol Hong Kim

13 Sep 2020-Applied Sciences

TL;DR: A novel method for diagnosing bearing faults and their degradation level under variable shaft speed and can achieve very high accuracy and robustness for bearing fault diagnosis even under noisy environments is proposed.

...read moreread less

Abstract: Predicting bearing faults is an essential task in machine health monitoring because bearings are vital components of rotary machines, especially heavy motor machines. Moreover, indicating the degradation level of bearings will help factories plan maintenance schedules. With advancements in the extraction of useful information from vibration signals, diagnosis of motor failures by maintenance engineers can be gradually replaced by an automatic detection process. Especially, state-of-the-art methods using deep learning have contributed significantly to automatic fault diagnosis. This paper proposes a novel method for diagnosing bearing faults and their degradation level under variable shaft speed. In the proposed method, vibration signals are represented by spectrograms to apply deep learning methods through preprocessing using Short-Time Fourier Transform (STFT). Then, feature extraction and health status classification are performed by a convolutional neural network (CNN), VGG16. According to our various experiments, our proposed method can achieve very high accuracy and robustness for bearing fault diagnosis even under noisy environments.

...read moreread less

40 citations

Proceedings Article•DOI•

Stress Detection Using Speech Spectrograms and Sigma-pi Neuron Units

[...]

Ling He¹, Margaret Lech¹, Namunu C. Maddage¹, Nicholas B. Allen²•Institutions (2)

RMIT University¹, University of Melbourne²

14 Aug 2009

TL;DR: The presented algorithm was tested using actual stressful speech utterances from SUSAS (Speech Under Simulated and Actual Stress) database on the vowel-based level and indicated that the proposed method can be applied to voiced speech in speech independent conditions.

...read moreread less

Abstract: This paper presents a new system for automatic stress detection in speech. In the process of feature extraction speech spectrograms were used as the primary features. The sigma-pi neuron cells were then employed to derive the secondary features. The analysis was performed at three alternative sets of analytical frequency bands: critical bands, Bark scale bands and equivalent rectangular bandwidth (ERB) scale bands. The presented algorithm was tested using actual stressful speech utterances from SUSAS (Speech Under Simulated and Actual Stress) database on the vowel-based level. The automatic stress-level classification was implemented using Gaussian mixture model (GMM) and k-nearest neighbor (KNN) classifiers. The strongest effect on the classification results was observed when selecting the type of frequency bands. The ERB scale provided the highest classification results ranging from 67.84% to 73.76%. The classification results did not differ between data sets containing specific types of vowels and data sets containing mixtures of vowels. This indicates that the proposed method can be applied to voiced speech in speech independent conditions.

...read moreread less

39 citations

Posted Content•

A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation

[...]

Ehsan Hosseini-Asl¹, Yingbo Zhou¹, Caiming Xiong¹, Richard Socher¹•Institutions (1)

Salesforce.com¹

27 Mar 2018-arXiv: Computation and Language

TL;DR: In this paper, a cyclic-consistent generative adversarial network (CycleGAN) is proposed for unsupervised speech domain adaptation, which employs multiple independent discriminators on the power spectrogram, each in charge of different frequency bands.

...read moreread less

Abstract: Domain adaptation plays an important role for speech recognition models, in particular, for domains that have low resources. We propose a novel generative model based on cyclic-consistent generative adversarial network (CycleGAN) for unsupervised non-parallel speech domain adaptation. The proposed model employs multiple independent discriminators on the power spectrogram, each in charge of different frequency bands. As a result we have 1) better discriminators that focus on fine-grained details of the frequency features, and 2) a generator that is capable of generating more realistic domain-adapted spectrogram. We demonstrate the effectiveness of our method on speech recognition with gender adaptation, where the model only has access to supervised data from one gender during training, but is evaluated on the other at test time. Our model is able to achieve an average of $7.41\%$ on phoneme error rate, and $11.10\%$ word error rate relative performance improvement as compared to the baseline, on TIMIT and WSJ dataset, respectively. Qualitatively, our model also generates more natural sounding speech, when conditioned on data from the other domain.

...read moreread less

39 citations

Proceedings Article•DOI•

Constrained and regularized variants of non-negative matrix factorization incorporating music-specific constraints

[...]

Hirokazu Kameoka¹, Masahiro Nakano², Kazuki Ochiai¹, Yutaka Imoto¹, Kunio Kashino², Shigeki Sagayama¹ - Show less +2 more•Institutions (2)

University of Tokyo¹, Nippon Telegraph and Telephone²

25 Mar 2012

TL;DR: New variants of the non-negative matrix factorization concept that incorporate music-specific constraints are introduced that incorporateMusic spectrograms' structural regularities.

...read moreread less

Abstract: Music spectrograms typically have many structural regularities that can be exploited to help solve the problem of decomposing a given spectrogram into distinct musically meaningful components. In this paper, we introduce new variants of the non-negative matrix factorization concept that incorporate music-specific constraints.

...read moreread less

39 citations

Collapse

Network Information

Performance

Metrics

7,848

Papers

107,060

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	627
2022	1,396
2021	488
2020	595
2019	593

Spectrogram

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics