scispace - formally typeset
Search or ask a question
Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.


Papers
More filters
Patent
27 Aug 2008
TL;DR: In this article, the authors obtained a separated signal from an audio signal based on the anisotropy of smoothness of spectral elements in the time-frequency domain, where a spectrogram of the audio signal is assumed to be a sum of a plurality of sub-spectrograms.
Abstract: The present invention obtains a separated signal from an audio signal based on the anisotropy of smoothness of spectral elements in the time-frequency domain. A spectrogram of the audio signal is assumed to be a sum of a plurality of sub-spectrograms, and smoothness of spectral elements of each sub-spectrogram in the time-frequency domain has directionality on the time-frequency plane. The method comprises obtaining a distribution coefficient for distributing spectral elements of said audio signal in the time-frequency domain to at least one sub-spectrogram based on the directionality of the smoothness of each sub-spectrogram on the time-frequency plane, and separating at least one sub-spectrogram from said spectral elements of said audio signal using said distribution coefficient.

29 citations

Journal ArticleDOI
TL;DR: This paper is involved in using only one ultrasonic sensor to detect stair-cases in electronic cane using a multiclass SVM approach and recognition rates of 82.4% has been achieved.
Abstract: Blinds people need some aid to interact with their environment with more security. A new device is then proposed to enable them to see the world with their ears. Considering not only system requirements but also technology cost, we used, for the conception of our tool, ultrasonic sensors and one monocular camera to enable user being aware of the presence and nature of potential encountered obstacles. In this paper, we are involved in using only one ultrasonic sensor to detect stair-cases in electronic cane. In this context, no previous work has considered such a challenge. Aware that the performance of an object recognition system depends on both object representation and classification algorithms, we have used in our system, one representation of ultrasonic signal in frequency domain: spectrogram representation explaining how the spectral density of signal varies with time, spectrum representation showing the amplitudes as a function of the frequency, periodogram representation estimating the spectral density of signal. Several features, thus extracted from each representation, contribute in the classification process. Our system was evaluated on a set of ultrasonic signal where stair-cases occur with different shapes. Using a multiclass SVM approach, recognition rates of 82.4% has been achieved.

29 citations

Posted Content
TL;DR: A data augmentation algorithm based on the imaging principle of the retina and convex lens is proposed, to acquire the different sizes of spectrogram and increase the amount of training data by changing the distance between the Spectrogram and the conveX lens.
Abstract: Speech emotion recognition (SER) is to study the formation and change of speaker's emotional state from the speech signal perspective, so as to make the interaction between human and computer more intelligent. SER is a challenging task that has encountered the problem of less training data and low prediction accuracy. Here we propose a data augmentation algorithm based on the imaging principle of the retina and convex lens, to acquire the different sizes of spectrogram and increase the amount of training data by changing the distance between the spectrogram and the convex lens. Meanwhile, with the help of deep learning to get the high-level features, we propose the Deep Retinal Convolution Neural Networks (DRCNNs) for SER and achieve the average accuracy over 99%. The experimental results indicate that DRCNNs outperforms the previous studies in terms of both the number of emotions and the accuracy of recognition. Predictably, our results will dramatically improve human-computer interaction.

29 citations

Posted Content
TL;DR: This paper introduces a novel technique for reconstructing the phase of modified spectrograms of audio signals from the analysis of mixtures of sinusoids and introduces an audio restoration framework, observing that the technique outperforms traditional methods.
Abstract: This paper introduces a novel technique for reconstructing the phase of modified spectrograms of audio signals. From the analysis of mixtures of sinusoids we obtain relationships between phases of successive time frames in the Time-Frequency (TF) domain. To obtain similar relationships over frequencies, in particular within onset frames, we study an impulse model. Instantaneous frequencies and attack times are estimated locally to encompass the class of non-stationary signals such as vibratos. These techniques ensure both the vertical coherence of partials (over frequencies) and the horizontal coherence (over time). The method is tested on a variety of data and demonstrates better performance than traditional consistency-based approaches. We also introduce an audio restoration framework and observe that our technique outperforms traditional methods.

29 citations

Proceedings ArticleDOI
Zhenyao Zhu1, Jesse Engel1, Awni Hannun1
08 Sep 2016
TL;DR: In this paper, the authors use convolutional filters to push past the inherent tradeoff of temporal and frequency resolution that exists for spectral representations, and show that increasing temporal resolution via reduced stride and increasing frequency resolution via additional filters delivers significant performance improvements.
Abstract: Deep learning has dramatically improved the performance of speech recognition systems through learning hierarchies of features optimized for the task at hand. However, true end-to-end learning, where features are learned directly from waveforms, has only recently reached the performance of hand-tailored representations based on the Fourier transform. In this paper, we detail an approach to use convolutional filters to push past the inherent tradeoff of temporal and frequency resolution that exists for spectral representations. At increased computational cost, we show that increasing temporal resolution via reduced stride and increasing frequency resolution via additional filters delivers significant performance improvements. Further, we find more efficient representations by simultaneously learning at multiple scales, leading to an overall decrease in word error rate on a difficult internal speech test set by 20.7% relative to networks with the same number of parameters trained on spectrograms.

29 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Wavelet
78K papers, 1.3M citations
76% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023627
20221,396
2021488
2020595
2019593