Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Patent•

Acoustic signature recognition and identification

[...]

Anthony Peter Russo¹•Institutions (1)

Alcatel-Lucent¹

30 Sep 1997-Journal of the Acoustical Society of America

TL;DR: In this paper, an acoustic signature recognition and identification system receives signals from a sensor placed on a designated piece of equipment, and the acoustic data is digitized and processed, via a Fast Fourier Transform routine, to create a spectrogram image of frequency versus time.

...read moreread less

Abstract: An acoustic signature recognition and identification system receives signals from a sensor placed on a designated piece of equipment. The acoustic data is digitized and processed, via a Fast Fourier Transform routine, to create a spectrogram image of frequency versus time. The spectrogram image is then normalized to permit acoustic pattern recognition regardless of the surrounding environment or magnitude of the acoustic signal. A feature extractor then detects, tracks and characterizes the lines which form the spectrogram. Specifically, the lines are detected via a KY process that is applied to each pixel in the line. A blob coloring process then groups spatially connected pixels into a single signal object. The harmonic content of the lines is then determined and compared with stored templates of known acoustic signatures to ascertain the type of machinery. An alert is then generated in response to the recognized and identified machinery.

...read moreread less

399 citations

Journal Article•DOI•

Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition

[...]

Jort F. Gemmeke¹, Tuomas Virtanen², Antti Hurmalainen²•Institutions (2)

Radboud University Nijmegen¹, Tampere University of Technology²

01 Sep 2011-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The results show that the hybrid system performed substantially better than source separation or missing data mask estimation at lower signal-to-noise ratios (SNRs), achieving up to 57.1% accuracy at SNR = -5 dB.

...read moreread less

Abstract: This paper proposes to use exemplar-based sparse representations for noise robust automatic speech recognition. First, we describe how speech can be modeled as a linear combination of a small number of exemplars from a large speech exemplar dictionary. The exemplars are time-frequency patches of real speech, each spanning multiple time frames. We then propose to model speech corrupted by additive noise as a linear combination of noise and speech exemplars, and we derive an algorithm for recovering this sparse linear combination of exemplars from the observed noisy speech. We describe how the framework can be used for doing hybrid exemplar-based/HMM recognition by using the exemplar-activations together with the phonetic information associated with the exemplars. As an alternative to hybrid recognition, the framework also allows us to take a source separation approach which enables exemplar-based feature enhancement as well as missing data mask estimation. We evaluate the performance of these exemplar-based methods in connected digit recognition on the AURORA-2 database. Our results show that the hybrid system performed substantially better than source separation or missing data mask estimation at lower signal-to-noise ratios (SNRs), achieving up to 57.1% accuracy at SNR = -5 dB. Although not as effective as two baseline recognizers at higher SNRs, the novel approach offers a promising direction of future research on exemplar-based ASR.

...read moreread less

388 citations

Proceedings Article•DOI•

A Wavenet for Speech Denoising

[...]

Dario Rethage¹, Jordi Pons¹, Xavier Serra¹•Institutions (1)

Pompeu Fabra University¹

15 Apr 2018

TL;DR: The proposed model adaptation retains Wavenet's powerful acoustic modeling capabilities, while significantly reducing its time-complexity by eliminating its autoregressive nature.

...read moreread less

Abstract: Most speech processing techniques use magnitude spectrograms as front-end and are therefore by default discarding part of the signal: the phase. In order to overcome this limitation’ we propose an end-to-end learning method for speech denoising based on Wavenet. The proposed model adaptation retains Wavenet's powerful acoustic modeling capabilities, while significantly reducing its time-complexity by eliminating its autoregressive nature. Specifically, the model makes use of non-causal, dilated convolutions and predicts target fields instead of a single target sample. The discriminative adaptation of the model we propose, learns in a supervised fashion via minimizing a regression loss. These modifications make the model highly parallelizable during both training and inference. Both quantitative and qualitative evaluations indicate that the proposed method is preferred over Wiener filtering, a common method based on processing the magnitude spectrogram.

...read moreread less

387 citations

Proceedings Article•DOI•

End-to-end learning for music audio

[...]

Sander Dieleman¹, Benjamin Schrauwen¹•Institutions (1)

Ghent University¹

04 May 2014

TL;DR: Although convolutional neural networks do not outperform a spectrogram-based approach, the networks are able to autonomously discover frequency decompositions from raw audio, as well as phase-and translation-invariant feature representations.

...read moreread less

Abstract: Content-based music information retrieval tasks have traditionally been solved using engineered features and shallow processing architectures. In recent years, there has been increasing interest in using feature learning and deep architectures instead, thus reducing the required engineering effort and the need for prior knowledge. However, this new approach typically still relies on mid-level representations of music audio, e.g. spectrograms, instead of raw audio signals. In this paper, we investigate whether it is possible to apply feature learning directly to raw audio signals. We train convolutional neural networks using both approaches and compare their performance on an automatic tagging task. Although they do not outperform a spectrogram-based approach, the networks are able to autonomously discover frequency decompositions from raw audio, as well as phase-and translation-invariant feature representations.

...read moreread less

379 citations

Proceedings Article•

Binary Coding of Speech Spectrograms Using a Deep Auto-encoder

[...]

Li Deng¹, Michael L. Seltzer¹, Dong Yu¹, Alex Acero¹, Abdelrahman Mohamed², Geoffrey E. Hinton² - Show less +2 more•Institutions (2)

Microsoft¹, University of Toronto²

01 Sep 2010

TL;DR: This paper reports the recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms and shows that the binary codes learned produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech.

...read moreread less

Abstract: This paper reports our recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms. The top layer of the generative model learns binary codes that can be used for efficient compression of speech and could also be used for scalable speech recognition or rapid speech content retrieval. Each layer of the generative model is fully connected to the layer below and the weights on these connections are pretrained efficiently by using the contrastive divergence approximation to the log likelihood gradient. After layer-bylayer pre-training we “unroll” the generative model to form a deep auto-encoder, whose parameters are then fine-tuned using back-propagation. To reconstruct the full-length speech spectrogram, individual spectrogram segments predicted by their respective binary codes are combined using an overlapand-add method. Experimental results on speech spectrogram coding demonstrate that the binary codes produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech. Index Terms: deep learning, speech feature extraction, neural networks, auto-encoder, binary codes, Boltzmann machine

...read moreread less

372 citations

Collapse

Network Information

Performance

Metrics

7,848

Papers

107,060

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	627
2022	1,396
2021	488
2020	595
2019	593

Spectrogram

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics