Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Musical Genre Classification Using Nonnegative Matrix Factorization-Based Features

[...]

Andre Holzapfel, Yannis Stylianou

01 Feb 2008-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: Using nonnegative matrix factorization to derive a novel description for the timbre of musical sounds, a spectrogram is factorized providing a characteristic spectral basis and compression is shown to reduce the noise present in the data set resulting in more stable classification models.

...read moreread less

Abstract: Nonnegative matrix factorization (NMF) is used to derive a novel description for the timbre of musical sounds. Using NMF, a spectrogram is factorized providing a characteristic spectral basis. Assuming a set of spectrograms given a musical genre, the space spanned by the vectors of the obtained spectral bases is modeled statistically using mixtures of Gaussians, resulting in a description of the spectral base for this musical genre. This description is shown to improve classification results by up to 23.3% compared to MFCC-based models, while the compression performed by the factorization decreases training time significantly. Using a distance-based stability measure this compression is shown to reduce the noise present in the data set resulting in more stable classification models. In addition, we compare the mean squared errors of the approximation to a spectrogram using independent component analysis and nonnegative matrix factorization, showing the superiority of the latter approach.

...read moreread less

116 citations

Proceedings Article•DOI•

Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization

[...]

Yoshiaki Bando¹, Masato Mimura¹, Katsutoshi Itoyama¹, Kazuyoshi Yoshii¹, Tatsuya Kawahara¹ - Show less +1 more•Institutions (1)

Kyoto University¹

15 Apr 2018

TL;DR: This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech that outperformed the conventional DNN-based method in unseen noisy environments.

...read moreread less

Abstract: This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech. A standard approach to speech enhancement is to train a deep neural network (DNN) to take noisy speech as input and output clean speech. Although this supervised approach requires a very large amount of pair data for training, it is not robust against unknown environments. Another approach is to use non-negative matrix factorization (NMF) based on basis spectra trained on clean speech in advance and those adapted to noise on the fly. This semi-supervised approach, however, causes considerable signal distortion in enhanced speech due to the unrealistic assumption that speech spectrograms are linear combinations of the basis spectra. Replacing the poor linear generative model of clean speech in NMF with a VAE—a powerful nonlinear deep generative model—trained on clean speech, we formulate a unified probabilistic generative model of noisy speech. Given noisy speech as observed data, we can sample clean speech from its posterior distribution. The proposed method outperformed the conventional DNN-based method in unseen noisy environments.

...read moreread less

115 citations

Proceedings Article•DOI•

LSTM time and frequency recurrence for automatic speech recognition

[...]

Jinyu Li¹, Abdelrahman Mohamed¹, Geoffrey Zweig¹, Yifan Gong¹•Institutions (1)

Microsoft¹

01 Dec 2015

TL;DR: Inspired by human spectrogram reading, this model first scans the frequency bands to generate a summary of the spectral information, and then uses the output layer activations as the input to a traditional time LSTM (T-LSTM).

...read moreread less

Abstract: Long short-term memory (LSTM) recurrent neural networks (RNNs) have recently shown significant performance improvements over deep feed-forward neural networks (DNNs). A key aspect of these models is the use of time recurrence, combined with a gating architecture that ameliorates the vanishing gradient problem. Inspired by human spectrogram reading, in this paper we propose an extension to LSTMs that performs the recurrence in frequency as well as in time. This model first scans the frequency bands to generate a summary of the spectral information, and then uses the output layer activations as the input to a traditional time LSTM (T-LSTM). Evaluated on a Microsoft short message dictation task, the proposed model obtained a 3.6% relative word error rate reduction over the T-LSTM.

...read moreread less

115 citations

Proceedings Article•DOI•

On the use of spectro-temporal features for the IEEE AASP challenge ‘detection and classification of acoustic scenes and events’

[...]

Jens Schroder, Niko Moritz, Marc René Schädler¹, Benjamin Cauchi, Kamil Adiloglu, Jörn Anemüller, Simon Doclo, Birger Kollmeier, Stefan Goetze - Show less +5 more•Institutions (1)

University of Oldenburg¹

01 Oct 2013

TL;DR: It is demonstrated that the proposed spectro-temporal features achieve a better recognition accuracy than MFCCs.

...read moreread less

Abstract: In this contribution, an acoustic event detection system based on spectro-temporal features and a two-layer hidden Markov model as back-end is proposed within the framework of the IEEE AASP challenge `Detection and Classification of Acoustic Scenes and Events' (D-CASE). Noise reduction based on the log-spectral amplitude estimator by [1] and noise power density estimation by [2] is used for signal enhancement. Performance based on three different kinds of features is compared, i.e. for amplitude modulation spectrogram, Gabor filterbank-features and conventional Mel-frequency cepstral coefficients (MFCCs), all of them known from automatic speech recognition (ASR). The evaluation is based on the office live recordings provided within the D-CASE challenge. The influence of the signal enhancement is investigated and the increase in recognition rate by the proposed features in comparison to MFCC-features is shown. It is demonstrated that the proposed spectro-temporal features achieve a better recognition accuracy than MFCCs.

...read moreread less

114 citations

Journal Article•DOI•

SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification

[...]

Jongpil Lee, Ji-Young Park, Keunhyoung Luke Kim, Juhan Nam

22 Jan 2018-Applied Sciences

TL;DR: A CNN architecture which learns representations using sample-level filters beyond typical frame-level input representations is proposed and extended using multi-level and multi-scale feature aggregation technique and subsequently conduct transfer learning for several music classification tasks.

...read moreread less

Abstract: Convolutional Neural Networks (CNN) have been applied to diverse machine learning tasks for different modalities of raw data in an end-to-end fashion. In the audio domain, a raw waveform-based approach has been explored to directly learn hierarchical characteristics of audio. However, the majority of previous studies have limited their model capacity by taking a frame-level structure similar to short-time Fourier transforms. We previously proposed a CNN architecture which learns representations using sample-level filters beyond typical frame-level input representations. The architecture showed comparable performance to the spectrogram-based CNN model in music auto-tagging. In this paper, we extend the previous work in three ways. First, considering the sample-level model requires much longer training time, we progressively downsample the input signals and examine how it affects the performance. Second, we extend the model using multi-level and multi-scale feature aggregation technique and subsequently conduct transfer learning for several music classification tasks. Finally, we visualize filters learned by the sample-level CNN in each layer to identify hierarchically learned features and show that they are sensitive to log-scaled frequency.

...read moreread less

114 citations

Collapse

Network Information

Performance

Metrics

7,848

Papers

107,060

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	627
2022	1,396
2021	488
2020	595
2019	593

Spectrogram

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics