scispace - formally typeset
Search or ask a question
Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.


Papers
More filters
Proceedings ArticleDOI
04 May 2020
TL;DR: The proposed CNN front-end fitted with the proposed convolutional attention modules outperform the no-attention and spatial-CBAM baselines by a significant margin on the VoxCeleb benchmark, concluding that simultaneously modelling temporal and frequency attention translates to better real-world performance.
Abstract: Majority of the recent approaches for text-independent speaker recognition apply attention or similar techniques for aggregation of frame-level feature descriptors generated by a deep neural network (DNN) front-end. In this paper, we propose methods of convolutional attention for independently modelling temporal and frequency information in a convolutional neural network (CNN) based front-end. Our system utilizes convolutional block attention modules (CBAMs) [1] appropriately modified to accommodate spectrogram inputs. The proposed CNN front-end fitted with the proposed convolutional attention modules outperform the no-attention and spatial-CBAM baselines by a significant margin on the VoxCeleb [2], [3] speaker verification benchmark. Our best model achieves an equal error rate of 2.031% on the VoxCeleb1 test set, which is a considerable improvement over comparable state of the art results. For a more thorough assessment of the effects of frequency and temporal attention in real-world conditions, we conduct ablation experiments by randomly dropping frequency bins and temporal frames from the input spectrograms, concluding that instead of modelling either of the entities, simultaneously modelling temporal and frequency attention translates to better real-world performance.

45 citations

Journal ArticleDOI
TL;DR: A novel method for abnormal heart sound detection using temporal quasi-periodic features and long short-term memory without segmentation is proposed, and the results indicate that the proposed method is competitive compared with the state-of-the-art abnormalheart sound detection methods.

45 citations

Proceedings ArticleDOI
19 Jan 2021
TL;DR: In this article, the effectiveness of log-Mel spectrogram and MFCC features for Alzheimer's dementia (AD) recognition on ADReSS challenge dataset was explored using three different deep neural networks (DNN) for AD recognition and mini-mental state examination (MMSE) score prediction.
Abstract: In this work, we explore the effectiveness of log-Mel spectrogram and MFCC features for Alzheimer’s dementia (AD) recognition on ADReSS challenge dataset We use three different deep neural networks (DNN) for AD recognition and mini-mental state examination (MMSE) score prediction: (i) convolutional neural network followed by a long-short term memory network (CNN-LSTM), (ii) pre-trained ResNet18 network followed by LSTM (ResNet-LSTM), and (iii) pyramidal bidirectional LSTM followed by a CNN (pBLSTM-CNN) CNN-LSTM achieves an accuracy of 6458% with MFCC features and ResNet-LSTM achieves an accuracy of 625% using log-Mel spectrograms pBLSTM-CNN and ResNet-LSTM models achieve root mean square errors (RMSE) of 59 and 598 in the MMSE score prediction, using the log-Mel spectrograms Our results beat the baseline accuracy (625%) and RMSE (614) reported for acoustic features on ADReSS challenge dataset The results suggest that log-Mel spectrograms and MFCCs are effective features for AD recognition problem when used with DNN models

45 citations

Journal ArticleDOI
TL;DR: This research applies the chirplet as a tool to analyze dispersive wave signals based on a dispersion model and demonstrates the effectiveness and robustness of this algorithm on real, experimentally measured Lamb wave signals by an adaption of a correlation technique developed in previous research.
Abstract: Time-frequency representations, like the spectrogram or the scalogram, are widely used to characterize dispersive waves. The resulting energy distributions, however, suffer from the uncertainty principle, which complicates the allocation of energy to individual propagation modes (especially when the dispersion curves of these modes are close to each other in the time-frequency domain). This research applies the chirplet as a tool to analyze dispersive wave signals based on a dispersion model. The chirplet transform, a generalization of both the wavelet and the short-time Fourier transform, enables the extraction of components of a signal with a particular instantaneous frequency and group delay. An adaptive algorithm identifies frequency regions for which quantitative statements can be made about an individual mode’s energy, and employs chirplets (locally adapted to a dispersion curve model) to extract the (proportional) energy distribution of that single mode from a multimode dispersive wave signal. The ...

45 citations

Journal ArticleDOI
TL;DR: A weighted hybrid binary representation (WHBR) method that converts the regression prediction process into a weighted combination of multiple binary classification problems and can greatly reduce the training time and improve the prediction accuracy is proposed.
Abstract: Music emotion recognition, which enables effective and efficient music organization and retrieval, is a challenging subject in the field of music information retrieval. In this paper, we propose a new bidirectional convolutional recurrent sparse network (BCRSN) for music emotion recognition based on convolutional neural networks and recurrent neural networks. Our model adaptively learns the sequential-information-included affect-salient features (SII-ASF) from the 2-D time–frequency representation (i.e., spectrogram) of music audio signals. By combining feature extraction, ASF selection, and emotion prediction, the BCRSN can achieve continuous emotion prediction of audio files. To reduce the high computational complexity caused by the numerical-type ground truth, we propose a weighted hybrid binary representation (WHBR) method that converts the regression prediction process into a weighted combination of multiple binary classification problems. We test our method on two benchmark databases, that is, the Database for Emotional Analysis in Music and MoodSwings Turk. The results show that the WHBR method can greatly reduce the training time and improve the prediction accuracy. The extracted SII-ASF is robust to genre, timbre, and noise variation and is sensitive to emotion. It achieves significant improvement compared to the best performing feature sets in MediaEval 2015. Meanwhile, extensive experiments demonstrate that the proposed method outperforms the state-of-the-art methods.

45 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Wavelet
78K papers, 1.3M citations
76% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023627
20221,396
2021488
2020595
2019593