Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Reconstruction of missing features for robust speech recognition

[...]

Bhiksha Raj¹, Michael L. Seltzer², Richard M. Stern²•Institutions (2)

Mitsubishi Electric Research Laboratories¹, Carnegie Mellon University²

01 Sep 2004-Speech Communication

TL;DR: Two missing-feature algorithms that reconstruct complete spectrograms from incomplete noisy ones are presented that result in better recognition performance overall and are also less expensive computationally and do not require modification of the recognizer.

...read moreread less

242 citations

Harmonic/Percussive Separation Using Median Filtering

[...]

Derry Fitzgerald¹•Institutions (1)

Dublin Institute of Technology¹

01 Jan 2010

TL;DR: In this paper, median filtering is used to separate the harmonic and percussive parts of a monaural audio signal, and the two resulting median filtered spectrograms are then used to generate masks which are then applied to the original spectrogram.

...read moreread less

Abstract: In this paper, we present a fast, simple and effective method to separate the harmonic and percussive parts of a monaural audio signal. The technique involves the use of median filtering on a spectrogram of the audio signal, with median filtering performed across successive frames to suppress percussive events and enhance harmonic components, while median filtering is also performed across frequency bins to enhance percussive events and supress harmonic components. The two resulting median filtered spectrograms are then used to generate masks which are then applied to the original spectrogram to separate the harmonic and percussive parts of the signal. We illustrate the use of the algorithm in the context of remixing audio material from commercial recordings.

...read moreread less

240 citations

Journal Article•DOI•

Robust sound event classification using deep neural networks

[...]

Ian McLoughlin¹, Haomin Zhang¹, Zhipeng Xie¹, Yan Song¹, Wei Xiao² - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Huawei²

01 Mar 2015-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A sound event classification framework is outlined that compares auditory image front end features with spectrogram image-based frontEnd features, using support vector machine and deep neural network classifiers, and is shown to compare very well with current state-of-the-art classification techniques.

...read moreread less

Abstract: The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.

...read moreread less

239 citations

Posted Content•

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement

[...]

Yanxin Hu¹, Yun Liu, Shubo Lv, Mengtao Xing¹, Shimin Zhang¹, Yihui Fu¹, Jian Wu², Bihong Zhang, Lei Xie¹ - Show less +5 more•Institutions (2)

Northwestern Polytechnical University¹, Microsoft²

01 Aug 2020-arXiv: Audio and Speech Processing

TL;DR: A new network structure simulating the complex-valued operation, called Deep Complex Convolution Recurrent Network (DCCRN), where both CNN and RNN structures can handle complex- valued operation.

...read moreread less

Abstract: Speech enhancement has benefited from the success of deep learning in terms of intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods focus on predicting TF-masks or speech spectrum, via a naive convolution neural network (CNN) or recurrent neural network (RNN). Some recent studies use complex-valued spectrogram as a training target but train in a real-valued network, predicting the magnitude and phase component or real and imaginary part, respectively. Particularly, convolution recurrent network (CRN) integrates a convolutional encoder-decoder (CED) structure and long short-term memory (LSTM), which has been proven to be helpful for complex targets. In order to train the complex target more effectively, in this paper, we design a new network structure simulating the complex-valued operation, called Deep Complex Convolution Recurrent Network (DCCRN), where both CNN and RNN structures can handle complex-valued operation. The proposed DCCRN models are very competitive over other previous networks, either on objective or subjective metric. With only 3.7M parameters, our DCCRN models submitted to the Interspeech 2020 Deep Noise Suppression (DNS) challenge ranked first for the real-time-track and second for the non-real-time track in terms of Mean Opinion Score (MOS).

...read moreread less

237 citations

Journal Article•DOI•

Learning Complex Spectral Mapping With Gated Convolutional Recurrent Networks for Monaural Speech Enhancement

[...]

Ke Tan¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

01 Jan 2020-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A gated convolutional recurrent network (GCRN) for complex spectral mapping is proposed, which amounts to a causal system for monaural speech enhancement and yields significantly higher STOI and PESQ than magnitude spectral mapping and complex ratio masking.

...read moreread less

Abstract: Phase is important for perceptual quality of speech. However, it seems intractable to directly estimate phase spectra through supervised learning due to their lack of spectrotemporal structure in it. Complex spectral mapping aims to estimate the real and imaginary spectrograms of clean speech from those of noisy speech, which simultaneously enhances magnitude and phase responses of speech. Inspired by multi-task learning, we propose a gated convolutional recurrent network (GCRN) for complex spectral mapping, which amounts to a causal system for monaural speech enhancement. Our experimental results suggest that the proposed GCRN substantially outperforms an existing convolutional neural network (CNN) for complex spectral mapping in terms of both objective speech intelligibility and quality. Moreover, the proposed approach yields significantly higher STOI and PESQ than magnitude spectral mapping and complex ratio masking. We also find that complex spectral mapping with the proposed GCRN provides an effective phase estimate.

...read moreread less

237 citations

Collapse

Network Information

Performance

Metrics

7,848

Papers

107,060

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	627
2022	1,396
2021	488
2020	595
2019	593

Spectrogram

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics