scispace - formally typeset
Search or ask a question
Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.


Papers
More filters
Journal ArticleDOI
Yu Wu1, Hua Mao1, Zhang Yi1
TL;DR: A task-independent model is proposed, called FreqCNN, to automaticly extract distinctive features from each frequency band by using convolutional kernels and an attention mechanism is introduced to systematically enhance the features from certain frequency bands.
Abstract: Audio classification, as a set of important and challenging tasks, groups speech signals according to speakers’ identities, accents, and emotional states. Due to the high dimensionality of the audio data, task-specific hand-crafted features extraction is always required and regarded cumbersome for various audio classification tasks. More importantly, the inherent relationship among features has not been fully exploited. In this paper, the original speech signal is first represented as spectrogram and later be split along the frequency domain to form frequency-distributed spectrogram. This paper proposes a task-independent model, called FreqCNN, to automaticly extract distinctive features from each frequency band by using convolutional kernels. Further more, an attention mechanism is introduced to systematically enhance the features from certain frequency bands. The proposed FreqCNN is evaluated on three publicly available speech databases thorough three independent classification tasks. The obtained results demonstrate superior performance over the state-of-the-art.

36 citations

Proceedings ArticleDOI
05 Nov 2015
TL;DR: The method proposed to detect the signature of wheezes imposes a temporal Gaussian regularization and a reduction of the false positives based on the (geodesic) morphological opening by reconstruction operator.
Abstract: In this work thirty features were tested in order to identify the best feature set for the robust detection of wheezes. The features include the detection of the wheezes signature in the spectrogram space (WS-SS) and twenty-nine musical features usually used in the context of Music Information Retrieval. The method proposed to detect the signature of wheezes imposes a temporal Gaussian regularization and a reduction of the false positives based on the (geodesic) morphological opening by reconstruction operator. Our dataset contains wheezes, crackles and normal breath sounds. Four selection algorithms were used to rank the features. The performance of the features was asserted having into account the Matthews correlation coefficient (MCC). All the selection algorithms ranked the WS-SS feature as the most important. A significant boost in performance was obtained by using around ten features. This improvement was independent of the selection algorithm. The use of more than ten features only allows for a small increase of the MCC value.

36 citations

Journal Article
TL;DR: A novel procedure for data-driven enhancement of informative signal by model each sub-signal in time-frequency representation by α-stable distribution, which is a generalization of standard Gaussian one and allows for modeling sub-Signals related to both informative and non-informative frequencies.
Abstract: A novel procedure for data-driven enhancement of informative signal is presented in this paper The introduced methodology covers decomposition of the signal via time-frequency spectrogram into set of narrowband sub-signals Furthermore, each of the sub-signals is considered as a sample of independent identically distributed random variables and we model the distribution of the sample, in contrast to the classical methodology where the simple statistics, for example kurtosis, for each sub-signal was calculated This approach provides a new perspective in the signal processing techniques for local damage detection Using our methodology one can eliminate potential risk related to high sensitivity towards single outlier In the proposed procedure we model each sub-signal in time-frequency representation by α-stable distribution This distribution is a generalization of standard Gaussian one and allows us for modeling sub-signals related to both informative and non-informative frequencies As a result, we obtain distribution of stability parameter vs frequencies that is analogy to spectral kurtosis approach well known in the literature Such characteristic is basis for filter design used for raw signal enhancement To evaluate efficiency of our method we compare raw and filtered signal in time, time-frequency and frequency (envelope spectrum) domains Moreover, we present comparison to the spectral kurtosis approach The presented methodology we applied to simulated signal and real vibration signal from two stage heavy duty gearbox used in mining industry

36 citations

Patent
03 Apr 1945
TL;DR: In this paper, the analysis of complex wave spectrograms has been studied for speech wave analysis, in the form of a spectrogram or pattern the dimensions of which have been defined.
Abstract: This invention relates to the analysis of complex waves and more particularly to the production of complex-wave spectrograms. It has been proposed heretofore to record complex waves, such as speech waves'for typical example, in the form of a spectrogram or pattern the dimensions of which have...

36 citations

Proceedings ArticleDOI
08 Dec 2009
TL;DR: New methods that extract characteristic features from speech magnitude spectrograms through a bank of 12 log-Gabor filters and an optimal feature selection procedure based on mutual information criteria are presented.
Abstract: We present new methods that extract characteristic features from speech magnitude spectrograms. Two of the presented approaches have been found particularly efficient in the process of automatic stress and emotion classification. In the first approach, the spectrograms are sub-divided into ERB frequency bands and the average energy for each band is calculated. In the second approach, the spectrograms are passed through a bank of 12 log-Gabor filters and the outputs are averaged and passed through an optimal feature selection procedure based on mutual information criteria. The proposed methods were tested using single vowels, words and sentences from SUSAS data base with 3 classes of stress, and spontaneous speech recordings made by psychologists (ORI) with 5 emotional classes. The classification results based on the Gaussian mixture model show correct classification rates of 40%-81%, for different SUSAS data sets and 40%-53.4% for the ORI data base.

36 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Wavelet
78K papers, 1.3M citations
76% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023627
20221,396
2021488
2020595
2019593