scispace - formally typeset
Search or ask a question
Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper has combined different signal processing techniques and a deep learning method to denoise, compress, segment, and classify PCG signals effectively and accurately and achieves overall testing accuracy of around 97.10%.
Abstract: Phonocardigraphy (PCG) is the graphical representation of heart sounds. The PCG signal contains useful information about the functionality and the condition of the heart. It also provides an early indication of potential cardiac abnormalities. Extracting cardiac information from heart sounds and detecting abnormal heart sounds to diagnose heart diseases using the PCG signal can play a vital role in remote patient monitoring. In this paper, we have combined different signal processing techniques and a deep learning method to denoise, compress, segment, and classify PCG signals effectively and accurately. First, the PCG signal is denoised and compressed by using a multi-resolution analysis based on the Discrete Wavelet Transform (DWT). Then, a segmentation algorithm, based on the Shannon energy envelope and zero-crossing, is applied to segment the PCG signal into four major parts: the first heart sound (S1), the systole interval, the second heart sound (S2), and the diastole interval. Finally, Mel-scaled power spectrogram and Mel-frequency cepstral coefficients (MFCC) are employed to extract informative features from the PCG signal, which are then fed into a classifier to classify each PCG signal into a normal or an abnormal signal by using a deep learning approach. For the classification, a 5-layer feed-forward Deep Neural Network (DNN) model is used, and overall testing accuracy of around 97.10% is achieved. Besides providing valuable information regarding heart condition, this signal processing approach can help cardiologists take appropriate and reliable steps toward diagnosis if any cardiovascular disorder is found in the initial stage.

48 citations

Journal ArticleDOI
07 Aug 2019
TL;DR: This paper introduces a methodology for heart disease detection based on heart sounds that employs three successive stages, such as spectrogram generation, deep feature extraction, and classification, which outperformed some of the existing methods.
Abstract: Heart sound contains various important quantities that help early detection of heart diseases. Many methods have been proposed so far where various signal-processing techniques have been used on heart sounds for heart disease detection. In this paper, a methodology is introduced for heart disease detection based on heart sounds. The proposed method employs three successive stages, such as spectrogram generation, deep feature extraction, and classification. In the spectrogram generation stage, the heart sounds are converted to spectrogram images by using time–frequency transformation. The deep features are extracted from three different pre-trained convolutional neural network models such as AlexNet, VGG16, and VGG19. Support vector machine classifier is used in the third stage of the proposed method. The proposed method is evaluated on two datasets, which are taken from The Classifying Heart Sounds Challenge. The obtained results are compared with some of the existing methods. The comparisons show that the proposed method outperformed.

48 citations

Journal ArticleDOI
TL;DR: In this article, a hybrid architecture based on acoustic and deep features was proposed to increase the classification accuracy in the problem of speech emotion recognition, which consists of feature extraction, feature selection and classification stages.
Abstract: The problem of recognition and classification of emotions in speech is one of the most prominent research topics, that has gained popularity, in human-computer interaction in the last decades. Having recognized the feelings or emotions in human conversations might have a deep impact on understanding a human’s physical and psychological situation. This study proposes a novel hybrid architecture based on acoustic and deep features to increase the classification accuracy in the problem of speech emotion recognition. The proposed method consists of feature extraction, feature selection and classification stages. At first, acoustic features such as Root Mean Square energy (RMS), Mel-Frequency Cepstral Coefficients (MFCC) and Zero-crossing Rate are obtained from voice records. Subsequently, spectrogram images of the original sound signals are given as input to the pre-trained deep network architecture, which is VGG16, ResNet18, ResNet50, ResNet101, SqueezeNet and DenseNet201 and deep features are extracted. Thereafter, a hybrid feature vector is created by combining acoustic and deep features. Also, the ReliefF algorithm is used to select more efficient features from the hybrid feature vector. Finally, in order for the completion of the classification task, Support vector machine (SVM) is used. Experiments are made using three popular datasets used in the literature so as to evaluate the effect of various techniques. These datasets are Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Berlin (EMO-DB) and Interactive Emotional Dyadic Motion Capture (IEMOCAP). As a consequence, we reach to 79.41%, 90.21% and 85.37% accuracy rates for RAVDESS, EMO-DB, and IEMOCAP datasets, respectively. The Final results obtained in experiments, clearly, show that the proposed technique might be utilized to accomplish the task of speech emotion recognition efficiently. Moreover, when our technique is compared with those of methods used in the context, it is obvious that our method outperforms others in terms of classification accuracy rates.

48 citations

Proceedings ArticleDOI
D. Friedman1
26 Apr 1985
TL;DR: A new time-frequency display is constructed based on the phase of the running short-time Fourier transform, specifically the distribution of its time derivative, indicating more precise location of formants than is usual for the spectrogram.
Abstract: A new time-frequency display is constructed based on the phase of the running short-time Fourier transform, specifically the distribution of its time derivative. Typical results are given for speech, indicating more precise location of formants than is usual for the spectrogram.

48 citations

Proceedings ArticleDOI
06 Jun 2021
TL;DR: In this article, a two-stream convolutional network for audio recognition is proposed, which operates on time-frequency spectrogram inputs and achieves state-of-the-art results on both VGG-Sound and EPIC-KITCHENS-100 datasets.
Abstract: We propose a two-stream convolutional network for audio recognition, that operates on time-frequency spectrogram inputs. Following similar success in visual recognition, we learn Slow-Fast auditory streams with separable convolutions and multi-level lateral connections. The Slow pathway has high channel capacity while the Fast pathway operates at a fine-grained temporal resolution. We showcase the importance of our two-stream proposal on two diverse datasets: VGG-Sound and EPIC-KITCHENS-100, and achieve state- of-the-art results on both.

48 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Wavelet
78K papers, 1.3M citations
76% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023627
20221,396
2021488
2020595
2019593