Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Dissertation•DOI•

Sound event recognition in unstructured environments using spectrogram image processing

[...]

Jonathan J. Dennis

01 Jan 2014

TL;DR: The approach taken is to interpret the sound event as a two-dimensional spectrogram image, with the two axes as the time and frequency dimensions, which enables novel methods for SER to be developed based on spectrogramimage processing, which are inspired by techniques from the field of image processing.

...read moreread less

Abstract: The objective of this research is to develop feature extraction and classification techniques for the task of sound event recognition (SER) in unstructured environments. Although this field is traditionally overshadowed by the popular field of automatic speech recognition (ASR), an SER system that can achieve human-like sound recognition performance opens up a range of novel application areas. These include acoustic surveillance, bio-acoustical monitoring, environmental context detection, healthcare applications and more generally the rich transcription of acoustic environments. The challenge in such environments are the adverse effects such as noise, distortion and multiple sources, which are more likely to occur with distant microphones compared to the close-talking microphones that are more common in ASR. In addition, the characteristics of acoustic events are less well defined than those of speech, and there is no sub-word dictionary available like the phonemes in speech. Therefore, the performance of ASR systems typically degrades dramatically in these challenging unstructured environments, and it is important to develop new methods that can perform well for this challenging task. In this thesis, the approach taken is to interpret the sound event as a two-dimensional spectrogram image, with the two axes as the time and frequency dimensions. This enables novel methods for SER to be developed based on spectrogram image processing, which are inspired by techniques from the field of image processing. The motivation for such an approach is based on finding an automatic approach to “spectrogram reading”, where it is possible for humans to visually recognise the different sound event signatures in the spectrogram. The advantages of such an approach are twofold. Firstly, the sound event image representation makes it possible to naturally capture the sound information in a two-dimensional feature. This has advantages over conventional onedimensional frame-based features, which capture only a slice of spectral information

...read moreread less

62 citations

Journal Article•DOI•

Efficient radar target classification using adaptive joint time-frequency processing

[...]

Kyung-Tae Kim¹, In-Sik Choi, Hyo-Tae Kim•Institutions (1)

Pohang University of Science and Technology¹

01 Dec 2000-IEEE Transactions on Antennas and Propagation

TL;DR: A new target recognition scheme via adaptive Gaussian representation, which uses adaptive joint time-frequency processing techniques, and derived exact and closed form expressions of geometrical moments of the adaptive spectrogram in the time, frequency, and jointtime-frequency domains.

...read moreread less

Abstract: This paper presents a new target recognition scheme via adaptive Gaussian representation, which uses adaptive joint time-frequency processing techniques. The feature extraction stage of the proposed scheme utilizes the geometrical moments of the adaptivity spectrogram. For this purpose, we have derived exact and closed form expressions of geometrical moments of the adaptive spectrogram in the time, frequency, and joint time-frequency domains. Features obtained by this method can provide substantial savings of computational resources, preserving as much essential information for classifying targets as possible. Next, a principal component analysis is used to further reduce the dimension of feature space, and the resulting feature vectors are passed to the classifier stage based on the multilayer perceptron neural network. To demonstrate the performance of the proposed scheme, various thin-wire targets are identified. The results show that the proposed technique has a significant potential for use in target recognition.

...read moreread less

62 citations

Posted Content•

Non-Autoregressive Neural Text-to-Speech

[...]

Kainan Peng¹, Wei Ping¹, Zhao Song¹, Kexin Zhao¹•Institutions (1)

Baidu¹

21 May 2019-arXiv: Computation and Language

TL;DR: ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram is proposed, which is fully convolutional and brings 46.7 times speed-up over the lightweight Deep Voice 3 at synthesis, while obtaining reasonably good speech quality.

...read moreread less

Abstract: In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram. It is fully convolutional and brings 46.7 times speed-up over the lightweight Deep Voice 3 at synthesis, while obtaining reasonably good speech quality. ParaNet also produces stable alignment between text and speech on the challenging test sentences by iteratively improving the attention in a layer-by-layer manner. Furthermore, we build the parallel text-to-speech system and test various parallel neural vocoders, which can synthesize speech from text through a single feed-forward pass. We also explore a novel VAE-based approach to train the inverse autoregressive flow (IAF) based parallel vocoder from scratch, which avoids the need for distillation from a separately trained WaveNet as previous work.

...read moreread less

62 citations

Posted Content•

Music Genre Classification using Machine Learning Techniques.

[...]

Hareesh Bahuleyan

03 Apr 2018-arXiv: Sound

TL;DR: This study compares the performance of two classes of models using a deep learning approach wherein a CNN model is trained end-to-end, to predict the genre label of an audio signal, solely using its spectrogram.

...read moreread less

Abstract: Categorizing music files according to their genre is a challenging task in the area of music information retrieval (MIR). In this study, we compare the performance of two classes of models. The first is a deep learning approach wherein a CNN model is trained end-to-end, to predict the genre label of an audio signal, solely using its spectrogram. The second approach utilizes hand-crafted features, both from the time domain and the frequency domain. We train four traditional machine learning classifiers with these features and compare their performance. The features that contribute the most towards this multi-class classification task are identified. The experiments are conducted on the Audio set data set and we report an AUC value of 0.894 for an ensemble classifier which combines the two proposed approaches.

...read moreread less

61 citations

Patent•

Feedforward neural network system for the detection and characterization of sonar signals with characteristic spectrogram textures

[...]

Michael Charles Recchione¹, Anthony Peter Russo¹•Institutions (1)

AT&T¹

23 Nov 1994

TL;DR: In this article, a method and system for characterizing the sounds of ocean captured by passive sonar listening devices is presented. But this method relies on a neural network ensemble that has been trained to favor specific features and/or parameters.

...read moreread less

Abstract: The present invention provides a method and system for characterizing the sounds of ocean captured by passive sonar listening devices. The present invention accomplishes this by first generating a spectrogram from the received sonar signal. The spectrogram is characterized in terms of textural features and signal processing parameters. The textural features and signal processing parameters are fed into a neural network ensemble that has been trained to favor specific features and/or parameters. The trained neural network ensemble classifies the signal as either Type-I or clutter.

...read moreread less

61 citations

Collapse

Network Information

Performance

Metrics

7,848

Papers

107,060

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	627
2022	1,396
2021	488
2020	595
2019	593

Spectrogram

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics