Topic
Spectrogram
About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.
Papers published on a yearly basis
Papers
More filters
•
27 Aug 2008TL;DR: In this article, the authors obtained a separated signal from an audio signal based on the anisotropy of smoothness of spectral elements in the time-frequency domain, where a spectrogram of the audio signal is assumed to be a sum of a plurality of sub-spectrograms.
Abstract: The present invention obtains a separated signal from an audio signal based on the anisotropy of smoothness of spectral elements in the time-frequency domain. A spectrogram of the audio signal is assumed to be a sum of a plurality of sub-spectrograms, and smoothness of spectral elements of each sub-spectrogram in the time-frequency domain has directionality on the time-frequency plane. The method comprises obtaining a distribution coefficient for distributing spectral elements of said audio signal in the time-frequency domain to at least one sub-spectrogram based on the directionality of the smoothness of each sub-spectrogram on the time-frequency plane, and separating at least one sub-spectrogram from said spectral elements of said audio signal using said distribution coefficient.
29 citations
••
TL;DR: This paper is involved in using only one ultrasonic sensor to detect stair-cases in electronic cane using a multiclass SVM approach and recognition rates of 82.4% has been achieved.
Abstract: Blinds people need some aid to interact with their environment with more security. A new device is then proposed to enable them to see the world with their ears. Considering not only system requirements but also technology cost, we used, for the conception of our tool, ultrasonic sensors and one monocular camera to enable user being aware of the presence and nature of potential encountered obstacles. In this paper, we are involved in using only one ultrasonic sensor to detect stair-cases in electronic cane. In this context, no previous work has considered such a challenge. Aware that the performance of an object recognition system depends on both object representation and classification algorithms, we have used in our system, one representation of ultrasonic signal in frequency domain: spectrogram representation explaining how the spectral density of signal varies with time, spectrum representation showing the amplitudes as a function of the frequency, periodogram representation estimating the spectral density of signal. Several features, thus extracted from each representation, contribute in the classification process. Our system was evaluated on a set of ultrasonic signal where stair-cases occur with different shapes. Using a multiclass SVM approach, recognition rates of 82.4% has been achieved.
29 citations
•
TL;DR: A data augmentation algorithm based on the imaging principle of the retina and convex lens is proposed, to acquire the different sizes of spectrogram and increase the amount of training data by changing the distance between the Spectrogram and the conveX lens.
Abstract: Speech emotion recognition (SER) is to study the formation and change of speaker's emotional state from the speech signal perspective, so as to make the interaction between human and computer more intelligent. SER is a challenging task that has encountered the problem of less training data and low prediction accuracy. Here we propose a data augmentation algorithm based on the imaging principle of the retina and convex lens, to acquire the different sizes of spectrogram and increase the amount of training data by changing the distance between the spectrogram and the convex lens. Meanwhile, with the help of deep learning to get the high-level features, we propose the Deep Retinal Convolution Neural Networks (DRCNNs) for SER and achieve the average accuracy over 99%. The experimental results indicate that DRCNNs outperforms the previous studies in terms of both the number of emotions and the accuracy of recognition. Predictably, our results will dramatically improve human-computer interaction.
29 citations
•
TL;DR: This paper introduces a novel technique for reconstructing the phase of modified spectrograms of audio signals from the analysis of mixtures of sinusoids and introduces an audio restoration framework, observing that the technique outperforms traditional methods.
Abstract: This paper introduces a novel technique for reconstructing the phase of modified spectrograms of audio signals. From the analysis of mixtures of sinusoids we obtain relationships between phases of successive time frames in the Time-Frequency (TF) domain. To obtain similar relationships over frequencies, in particular within onset frames, we study an impulse model. Instantaneous frequencies and attack times are estimated locally to encompass the class of non-stationary signals such as vibratos. These techniques ensure both the vertical coherence of partials (over frequencies) and the horizontal coherence (over time). The method is tested on a variety of data and demonstrates better performance than traditional consistency-based approaches. We also introduce an audio restoration framework and observe that our technique outperforms traditional methods.
29 citations
••
08 Sep 2016TL;DR: In this paper, the authors use convolutional filters to push past the inherent tradeoff of temporal and frequency resolution that exists for spectral representations, and show that increasing temporal resolution via reduced stride and increasing frequency resolution via additional filters delivers significant performance improvements.
Abstract: Deep learning has dramatically improved the performance of speech recognition systems through learning hierarchies of features optimized for the task at hand. However, true end-to-end learning, where features are learned directly from waveforms, has only recently reached the performance of hand-tailored representations based on the Fourier transform. In this paper, we detail an approach to use convolutional filters to push past the inherent tradeoff of temporal and frequency resolution that exists for spectral representations. At increased computational cost, we show that increasing temporal resolution via reduced stride and increasing frequency resolution via additional filters delivers significant performance improvements. Further, we find more efficient representations by simultaneously learning at multiple scales, leading to an overall decrease in word error rate on a difficult internal speech test set by 20.7% relative to networks with the same number of parameters trained on spectrograms.
29 citations