scispace - formally typeset
Search or ask a question
Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.


Papers
More filters
DissertationDOI
01 Jan 2014
TL;DR: The approach taken is to interpret the sound event as a two-dimensional spectrogram image, with the two axes as the time and frequency dimensions, which enables novel methods for SER to be developed based on spectrogramimage processing, which are inspired by techniques from the field of image processing.
Abstract: The objective of this research is to develop feature extraction and classification techniques for the task of sound event recognition (SER) in unstructured environments. Although this field is traditionally overshadowed by the popular field of automatic speech recognition (ASR), an SER system that can achieve human-like sound recognition performance opens up a range of novel application areas. These include acoustic surveillance, bio-acoustical monitoring, environmental context detection, healthcare applications and more generally the rich transcription of acoustic environments. The challenge in such environments are the adverse effects such as noise, distortion and multiple sources, which are more likely to occur with distant microphones compared to the close-talking microphones that are more common in ASR. In addition, the characteristics of acoustic events are less well defined than those of speech, and there is no sub-word dictionary available like the phonemes in speech. Therefore, the performance of ASR systems typically degrades dramatically in these challenging unstructured environments, and it is important to develop new methods that can perform well for this challenging task. In this thesis, the approach taken is to interpret the sound event as a two-dimensional spectrogram image, with the two axes as the time and frequency dimensions. This enables novel methods for SER to be developed based on spectrogram image processing, which are inspired by techniques from the field of image processing. The motivation for such an approach is based on finding an automatic approach to “spectrogram reading”, where it is possible for humans to visually recognise the different sound event signatures in the spectrogram. The advantages of such an approach are twofold. Firstly, the sound event image representation makes it possible to naturally capture the sound information in a two-dimensional feature. This has advantages over conventional onedimensional frame-based features, which capture only a slice of spectral information

62 citations

Journal ArticleDOI
TL;DR: A new target recognition scheme via adaptive Gaussian representation, which uses adaptive joint time-frequency processing techniques, and derived exact and closed form expressions of geometrical moments of the adaptive spectrogram in the time, frequency, and jointtime-frequency domains.
Abstract: This paper presents a new target recognition scheme via adaptive Gaussian representation, which uses adaptive joint time-frequency processing techniques. The feature extraction stage of the proposed scheme utilizes the geometrical moments of the adaptivity spectrogram. For this purpose, we have derived exact and closed form expressions of geometrical moments of the adaptive spectrogram in the time, frequency, and joint time-frequency domains. Features obtained by this method can provide substantial savings of computational resources, preserving as much essential information for classifying targets as possible. Next, a principal component analysis is used to further reduce the dimension of feature space, and the resulting feature vectors are passed to the classifier stage based on the multilayer perceptron neural network. To demonstrate the performance of the proposed scheme, various thin-wire targets are identified. The results show that the proposed technique has a significant potential for use in target recognition.

62 citations

Posted Content
Kainan Peng1, Wei Ping1, Zhao Song1, Kexin Zhao1
TL;DR: ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram is proposed, which is fully convolutional and brings 46.7 times speed-up over the lightweight Deep Voice 3 at synthesis, while obtaining reasonably good speech quality.
Abstract: In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram. It is fully convolutional and brings 46.7 times speed-up over the lightweight Deep Voice 3 at synthesis, while obtaining reasonably good speech quality. ParaNet also produces stable alignment between text and speech on the challenging test sentences by iteratively improving the attention in a layer-by-layer manner. Furthermore, we build the parallel text-to-speech system and test various parallel neural vocoders, which can synthesize speech from text through a single feed-forward pass. We also explore a novel VAE-based approach to train the inverse autoregressive flow (IAF) based parallel vocoder from scratch, which avoids the need for distillation from a separately trained WaveNet as previous work.

62 citations

Posted Content
TL;DR: This study compares the performance of two classes of models using a deep learning approach wherein a CNN model is trained end-to-end, to predict the genre label of an audio signal, solely using its spectrogram.
Abstract: Categorizing music files according to their genre is a challenging task in the area of music information retrieval (MIR). In this study, we compare the performance of two classes of models. The first is a deep learning approach wherein a CNN model is trained end-to-end, to predict the genre label of an audio signal, solely using its spectrogram. The second approach utilizes hand-crafted features, both from the time domain and the frequency domain. We train four traditional machine learning classifiers with these features and compare their performance. The features that contribute the most towards this multi-class classification task are identified. The experiments are conducted on the Audio set data set and we report an AUC value of 0.894 for an ensemble classifier which combines the two proposed approaches.

61 citations

Patent
23 Nov 1994
TL;DR: In this article, a method and system for characterizing the sounds of ocean captured by passive sonar listening devices is presented. But this method relies on a neural network ensemble that has been trained to favor specific features and/or parameters.
Abstract: The present invention provides a method and system for characterizing the sounds of ocean captured by passive sonar listening devices. The present invention accomplishes this by first generating a spectrogram from the received sonar signal. The spectrogram is characterized in terms of textural features and signal processing parameters. The textural features and signal processing parameters are fed into a neural network ensemble that has been trained to favor specific features and/or parameters. The trained neural network ensemble classifies the signal as either Type-I or clutter.

61 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Wavelet
78K papers, 1.3M citations
76% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023627
20221,396
2021488
2020595
2019593