Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Frequency and Temporal Convolutional Attention for Text-Independent Speaker Recognition

[...]

Sarthak Yadav, Atul Rai

04 May 2020

TL;DR: The proposed CNN front-end fitted with the proposed convolutional attention modules outperform the no-attention and spatial-CBAM baselines by a significant margin on the VoxCeleb benchmark, concluding that simultaneously modelling temporal and frequency attention translates to better real-world performance.

...read moreread less

Abstract: Majority of the recent approaches for text-independent speaker recognition apply attention or similar techniques for aggregation of frame-level feature descriptors generated by a deep neural network (DNN) front-end. In this paper, we propose methods of convolutional attention for independently modelling temporal and frequency information in a convolutional neural network (CNN) based front-end. Our system utilizes convolutional block attention modules (CBAMs) [1] appropriately modified to accommodate spectrogram inputs. The proposed CNN front-end fitted with the proposed convolutional attention modules outperform the no-attention and spatial-CBAM baselines by a significant margin on the VoxCeleb [2], [3] speaker verification benchmark. Our best model achieves an equal error rate of 2.031% on the VoxCeleb1 test set, which is a considerable improvement over comparable state of the art results. For a more thorough assessment of the effects of frequency and temporal attention in real-world conditions, we conduct ablation experiments by randomly dropping frequency bins and temporal frames from the input spectrograms, concluding that instead of modelling either of the entities, simultaneously modelling temporal and frequency attention translates to better real-world performance.

...read moreread less

45 citations

Journal Article•DOI•

Abnormal heart sound detection using temporal quasi-periodic features and long short-term memory without segmentation

[...]

Wenjie Zhang¹, Jiqing Han¹, Shiwen Deng²•Institutions (2)

Harbin Institute of Technology¹, Harbin Normal University²

01 Aug 2019-Biomedical Signal Processing and Control

TL;DR: A novel method for abnormal heart sound detection using temporal quasi-periodic features and long short-term memory without segmentation is proposed, and the results indicate that the proposed method is competitive compared with the state-of-the-art abnormalheart sound detection methods.

...read moreread less

45 citations

Proceedings Article•DOI•

An Exploration of Log-Mel Spectrogram and MFCC Features for Alzheimer’s Dementia Recognition from Spontaneous Speech

[...]

Amit Meghanani¹, C S Anoop¹, A. G. Ramakrishnan¹•Institutions (1)

Indian Institute of Science¹

19 Jan 2021

TL;DR: In this article, the effectiveness of log-Mel spectrogram and MFCC features for Alzheimer's dementia (AD) recognition on ADReSS challenge dataset was explored using three different deep neural networks (DNN) for AD recognition and mini-mental state examination (MMSE) score prediction.

...read moreread less

Abstract: In this work, we explore the effectiveness of log-Mel spectrogram and MFCC features for Alzheimer’s dementia (AD) recognition on ADReSS challenge dataset We use three different deep neural networks (DNN) for AD recognition and mini-mental state examination (MMSE) score prediction: (i) convolutional neural network followed by a long-short term memory network (CNN-LSTM), (ii) pre-trained ResNet18 network followed by LSTM (ResNet-LSTM), and (iii) pyramidal bidirectional LSTM followed by a CNN (pBLSTM-CNN) CNN-LSTM achieves an accuracy of 6458% with MFCC features and ResNet-LSTM achieves an accuracy of 625% using log-Mel spectrograms pBLSTM-CNN and ResNet-LSTM models achieve root mean square errors (RMSE) of 59 and 598 in the MMSE score prediction, using the log-Mel spectrograms Our results beat the baseline accuracy (625%) and RMSE (614) reported for acoustic features on ADReSS challenge dataset The results suggest that log-Mel spectrograms and MFCCs are effective features for AD recognition problem when used with DNN models

...read moreread less

45 citations

Journal Article•DOI•

Model-based analysis of dispersion curves using chirplets.

[...]

Helge Kuttig¹, Marc Niethammer, Stefan Hurlebaus², Laurence J. Jacobs¹•Institutions (2)

Georgia Institute of Technology¹, Texas A&M University²

28 Mar 2006-Journal of the Acoustical Society of America

TL;DR: This research applies the chirplet as a tool to analyze dispersive wave signals based on a dispersion model and demonstrates the effectiveness and robustness of this algorithm on real, experimentally measured Lamb wave signals by an adaption of a correlation technique developed in previous research.

...read moreread less

Abstract: Time-frequency representations, like the spectrogram or the scalogram, are widely used to characterize dispersive waves. The resulting energy distributions, however, suffer from the uncertainty principle, which complicates the allocation of energy to individual propagation modes (especially when the dispersion curves of these modes are close to each other in the time-frequency domain). This research applies the chirplet as a tool to analyze dispersive wave signals based on a dispersion model. The chirplet transform, a generalization of both the wavelet and the short-time Fourier transform, enables the extraction of components of a signal with a particular instantaneous frequency and group delay. An adaptive algorithm identifies frequency regions for which quantitative statements can be made about an individual mode’s energy, and employs chirplets (locally adapted to a dispersion curve model) to extract the (proportional) energy distribution of that single mode from a multimode dispersive wave signal. The ...

...read moreread less

45 citations

Journal Article•DOI•

Bidirectional Convolutional Recurrent Sparse Network (BCRSN): An Efficient Model for Music Emotion Recognition

[...]

Yizhuo Dong¹, Xinyu Yang¹, Xi Zhao¹, Juan Li¹•Institutions (1)

Xi'an Jiaotong University¹

23 May 2019-IEEE Transactions on Multimedia

TL;DR: A weighted hybrid binary representation (WHBR) method that converts the regression prediction process into a weighted combination of multiple binary classification problems and can greatly reduce the training time and improve the prediction accuracy is proposed.

...read moreread less

Abstract: Music emotion recognition, which enables effective and efficient music organization and retrieval, is a challenging subject in the field of music information retrieval. In this paper, we propose a new bidirectional convolutional recurrent sparse network (BCRSN) for music emotion recognition based on convolutional neural networks and recurrent neural networks. Our model adaptively learns the sequential-information-included affect-salient features (SII-ASF) from the 2-D time–frequency representation (i.e., spectrogram) of music audio signals. By combining feature extraction, ASF selection, and emotion prediction, the BCRSN can achieve continuous emotion prediction of audio files. To reduce the high computational complexity caused by the numerical-type ground truth, we propose a weighted hybrid binary representation (WHBR) method that converts the regression prediction process into a weighted combination of multiple binary classification problems. We test our method on two benchmark databases, that is, the Database for Emotional Analysis in Music and MoodSwings Turk. The results show that the WHBR method can greatly reduce the training time and improve the prediction accuracy. The extracted SII-ASF is robust to genre, timbre, and noise variation and is sensitive to emotion. It achieves significant improvement compared to the best performing feature sets in MediaEval 2015. Meanwhile, extensive experiments demonstrate that the proposed method outperforms the state-of-the-art methods.

...read moreread less

45 citations

Collapse

Network Information

Performance

Metrics

7,848

Papers

107,060

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	627
2022	1,396
2021	488
2020	595
2019	593

Spectrogram

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics