scispace - formally typeset
Search or ask a question
Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.


Papers
More filters
Journal ArticleDOI
01 Jan 2020
TL;DR: In this paper, a mixed frequency masking data augmentation method is proposed for audio classification, which adopts a nonlinear combination method to construct new samples and a linear method for constructing labels.
Abstract: Deep learning focuses on the representation of the input data and generalization of the model. It is well known that data augmentation can combat overfitting and improve the generalization ability of deep neural network. In this paper, we summarize and compare multiple data augmentation methods for audio classification. These strategies include traditional methods on raw audio signal, as well as the current popular augmentation of linear interpolation and nonlinear mixing on the spectrum. We explore the generation of new samples, the transformation of labels, and the combination patterns of samples and labels of each data augmentation method. Finally, inspired by SpecAugment and Mixup, we propose an effective and easy to implement data augmentation method, which we call Mixed frequency Masking data augmentation. This method adopts nonlinear combination method to construct new samples and linear method to construct labels. All methods are verified on the Freesound Dataset Kaggle2018 dataset, and ResNet is adopted as the classifier. The baseline system uses the log-mel spectrogram feature as the input. We use mean Average Precision @3 (mAP@3) as the evaluation metric to evaluate the performance of all data augmentation methods.

37 citations

Journal ArticleDOI
TL;DR: An approach to extract features from EEG signals is proposed based on spectrograms using STFT to obtain time-frequency representations, and is evaluated using the dataset from Bonn University, identifying a healthy person and an epileptic attack classes as main task.

37 citations

Proceedings ArticleDOI
25 Oct 1994
TL;DR: A general methodology providing a better readability of any bilinear distribution, referred to as reassignment, is essentially a generalization of an improvement of the spectrogram proposed by Kodera, Gendrin and de Villedary (1978).
Abstract: A general methodology providing a better readability of any bilinear distribution has been proposed. This methodology, referred to as reassignment, is essentially a generalization of an improvement of the spectrogram proposed by Kodera, Gendrin and de Villedary (1978). After a presentation of this original work, its generalization to a wide range of distributions is shown. The close connections of this method with some related approaches are also underlined. >

37 citations

Journal ArticleDOI
TL;DR: Two novel algorithms based on GLA and the alternating direction method of multipliers (ADMM) are proposed for better recovery with fewer iteration for better perceptual quality in some cases.
Abstract: Recovering a signal from its amplitude spectrogram, or phase recovery, exhibits many applications in acoustic signal processing. When only an amplitude spectrogram is available and no explicit information is given for the phases, the Griffin–Lim algorithm (GLA) is one of the most utilized methods for phase recovery. However, GLA often requires many iterations and results in low perceptual quality in some cases. In this letter, we propose two novel algorithms based on GLA and the alternating direction method of multipliers (ADMM) for better recovery with fewer iteration. Some interpretation of the existing methods and their relation to the proposed method are also provided. Evaluations are performed with both objective measure and subjective test.

37 citations

Journal ArticleDOI
TL;DR: A new method for VFR using the norm of the derivative parameters in deciding to retain or to discard a frame is introduced, and informal inspection of speech spectrograms shows that this new method puts more emphasis on the transient regions of the speech signal.
Abstract: Variable frame rate (VFR) analysis is a technique used in speech processing and recognition for discarding frames that are too much alike. The article introduces a new method for VFR. Instead of calculating the distance between frames, the norm of the derivative parameters is used in deciding to retain or to discard a frame, informal inspection of speech spectrograms shows that this new method puts more emphasis on the transient regions of the speech signal. Experimental results with a hidden Markov model (HMM) based system show that the new method outperforms the classical method. >

37 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Wavelet
78K papers, 1.3M citations
76% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023627
20221,396
2021488
2020595
2019593