Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures.

doi:10.21437/INTERSPEECH.2017-667

Proceedings ArticleDOI

Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures.

Kateřina Žmolíková, +5 more

- pp 2655-2659

Chats0

TLDR

This work uses a neural network to estimate masks to extract the target speaker and derive beamformer filters using these masks, in a similar way as the recently proposed approach for extraction of speech in presence of noise.

Abstract:

In this work, we address the problem of extracting one target speaker from a multichannel mixture of speech. We use a neural network to estimate masks to extract the target speaker and derive beamformer filters using these masks, in a similar way as the recently proposed approach for extraction of speech in presence of noise. To overcome the permutation ambiguity of neural network mask estimation, which arises in presence of multiple speakers, we propose to inform the neural network about the target speaker so that it learns to follow the speaker characteristics through the utterance. We investigate and compare different methods of passing the speaker information to the network such as making one layer of the network dependent on speaker characteristics. Experiments on mixture of two speakers demonstrate that the proposed scheme can track and extract a target speaker for both closed and open speaker set cases.

Citations

PDF

Open Access

More filters

Posted Content

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking.

Quan Wang, +9 more

- 11 Oct 2018 -

arXiv: Audio and Speech Processing

TL;DR: In this paper, a speaker recognition network that produces speaker-discriminative embeddings and a spectrogram masking network that takes both noisy spectrogram and speaker embedding as input, and produces a mask.

...read moreread less

Proceedings ArticleDOI

Continuous Speech Separation: Dataset and Analysis

Zhuo Chen, +8 more

TL;DR: A new real recording dataset, called LibriCSS, is derived from LibriSpeech by concatenating the corpus utterances to simulate conversations and capturing the audio replays with far-field microphones, which helps researchers from developing systems that can be readily applied to real scenarios.

...read moreread less

Proceedings ArticleDOI

Single Channel Target Speaker Extraction and Recognition with Speaker Beam

Marc Delcroix, +4 more

TL;DR: This paper addresses the problem of single channel speech recognition of a target speaker in a mixture of speech signals by exploiting auxiliary speaker information provided by an adaptation utterance from the target speaker to extract and recognize only that speaker.

...read moreread less

Journal ArticleDOI

SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

Katerina Zmolikova, +6 more

- 13 Jun 2019 -

IEEE Journal of Selected Topics in Signa...

TL;DR: This paper introduces SpeakerBeam, a method for extracting a target speaker from the mixture based on an adaptation utterance spoken by the target speaker and shows the benefit of including speaker information in the processing and the effectiveness of the proposed method.

...read moreread less

Proceedings ArticleDOI

Target-Speaker Voice Activity Detection: A Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario.

Ivan Medennikov, +11 more

TL;DR: A novel Target-Speaker Voice Activity Detection (TS-VAD) approach, which directly predicts an activity of each speaker on each time frame, outperforming the baseline x-vector-based system by more than 30% Diarization Error Rate (DER) abs.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Book

Independent Component Analysis

Aapo Hyvärinen, +2 more

TL;DR: Independent component analysis as mentioned in this paper is a statistical generative model based on sparse coding, which is basically a proper probabilistic formulation of the ideas underpinning sparse coding and can be interpreted as providing a Bayesian prior.

...read moreread less

Journal ArticleDOI

Image method for efficiently simulating small‐room acoustics

Jont B. Allen, +1 more

- 01 Nov 1976 -

Journal of the Acoustical Society of Ame...

TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.

...read moreread less

Journal ArticleDOI

Performance measurement in blind audio source separation

Emmanuel Vincent, +2 more

- 01 Jul 2006 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: This paper considers four different sets of allowed distortions in blind audio source separation algorithms, from time-invariant gains to time-varying filters, and derives a global performance measure using an energy ratio, plus a separate performance measure for each error term.

...read moreread less

Journal ArticleDOI

Evaluation of Objective Quality Measures for Speech Enhancement

Yi Hu, +1 more

- 01 Jan 2008 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: The evaluation of correlations of several objective measures with these three subjective rating scales is reported on and several new composite objective measures are also proposed by combining the individual objective measures using nonparametric and parametric regression analysis techniques.

...read moreread less

Collapse

IEEE Transactions on Audio, Speech, and ...

Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation

Yi Luo, +1 more

- 01 Aug 2019 -

IEEE Transactions on Audio, Speech, and ...

Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures.

Citations

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking.

Continuous Speech Separation: Dataset and Analysis

Single Channel Target Speaker Extraction and Recognition with Speaker Beam

SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

Target-Speaker Voice Activity Detection: A Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario.

References

Adam: A Method for Stochastic Optimization

Independent Component Analysis

Image method for efficiently simulating small‐room acoustics

Performance measurement in blind audio source separation

Evaluation of Objective Quality Measures for Speech Enhancement

Related Papers (5)

Deep clustering: Discriminative embeddings for segmentation and separation

Permutation invariant training of deep models for speaker-independent multi-talker speech separation

Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation

The Kaldi Speech Recognition Toolkit

Trending Questions (1)