scispace - formally typeset
Open AccessPosted Content

Multi-Microphone Complex Spectral Mapping for Speech Dereverberation

Reads0
Chats0
TLDR
Experimental results on multi-channel speech dereverberation demonstrate the effectiveness of the proposed approach and the integration of multi-microphone complex spectral mapping with beamforming and post-filtering is investigated.
Abstract
This study proposes a multi-microphone complex spectral mapping approach for speech dereverberation on a fixed array geometry. In the proposed approach, a deep neural network (DNN) is trained to predict the real and imaginary (RI) components of direct sound from the stacked reverberant (and noisy) RI components of multiple microphones. We also investigate the integration of multi-microphone complex spectral mapping with beamforming and post-filtering. Experimental results on multi-channel speech dereverberation demonstrate the effectiveness of the proposed approach.

read more

Citations
More filters
Posted Content

A consolidated view of loss functions for supervised deep learning-based speech enhancement

TL;DR: This work investigates a wide variety of loss spectral functions for a recurrent neural network architecture suitable to operate in online frame-by-frame processing and reveals that combining magnitude-only with phase-aware objectives always leads to improvements, even when the phase is not enhanced.
Journal ArticleDOI

Neural Spectrospatial Filtering

TL;DR: It is concluded that this neural spectrospatial filter provides a strong alternative to traditional and mask-based beamforming and achieves separation performance comparable to or better than beamforming for different array geometries and speech separation tasks and reduces to monaural complex spectral mapping in single-channel conditions.
Proceedings ArticleDOI

A consolidated view of loss functions for supervised deep learning-based speech enhancement

TL;DR: In this article, the authors investigated a wide variety of loss spectral functions for a recurrent neural network architecture suitable to operate in online frame-by-frame processing and found that combining magnitude-only with phase-aware objectives always leads to improvements, even when the phase is not enhanced.
Journal ArticleDOI

TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

TL;DR: TF-GridNet as mentioned in this paper is a multi-path deep neural network (DNN) integrating full-and sub-band modeling in the T-F domain, which achieves state-of-the-art performance on the multi-channel tasks of SMS-WSJ and WHAMR!.
Journal ArticleDOI

STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency

TL;DR: Compared with Conv-TasNet, the STFT- domain system can achieve better enhancement performance for a comparable amount of computation, or comparable perfor- mance with less computation, maintaining strong performance at an algorithmic latency as low as 2 ms.
References
More filters
Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Proceedings ArticleDOI

Densely Connected Convolutional Networks

TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Proceedings ArticleDOI

Deep clustering: Discriminative embeddings for segmentation and separation

TL;DR: In this paper, a deep network is trained to assign contrastive embedding vectors to each time-frequency region of the spectrogram in order to implicitly predict the segmentation labels of the target spectrogram from the input mixtures.
Journal ArticleDOI

Supervised Speech Separation Based on Deep Learning: An Overview

TL;DR: A comprehensive overview of deep learning-based supervised speech separation can be found in this paper, where three main components of supervised separation are discussed: learning machines, training targets, and acoustic features.
Journal ArticleDOI

Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

TL;DR: In this article, the utterance-level permutation invariant training (uPIT) technique was proposed for speaker independent multitalker speech separation, where RNNs, trained with uPIT, can separate multitalker mixed speech without any prior knowledge of signal duration, number of speakers, speaker identity, or gender.
Related Papers (5)