scispace - formally typeset
Proceedings ArticleDOI

Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures.

Reads0
Chats0
TLDR
This work uses a neural network to estimate masks to extract the target speaker and derive beamformer filters using these masks, in a similar way as the recently proposed approach for extraction of speech in presence of noise.
Abstract
In this work, we address the problem of extracting one target speaker from a multichannel mixture of speech. We use a neural network to estimate masks to extract the target speaker and derive beamformer filters using these masks, in a similar way as the recently proposed approach for extraction of speech in presence of noise. To overcome the permutation ambiguity of neural network mask estimation, which arises in presence of multiple speakers, we propose to inform the neural network about the target speaker so that it learns to follow the speaker characteristics through the utterance. We investigate and compare different methods of passing the speaker information to the network such as making one layer of the network dependent on speaker characteristics. Experiments on mixture of two speakers demonstrate that the proposed scheme can track and extract a target speaker for both closed and open speaker set cases.

read more

Citations
More filters
Posted Content

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking.

TL;DR: In this paper, a speaker recognition network that produces speaker-discriminative embeddings and a spectrogram masking network that takes both noisy spectrogram and speaker embedding as input, and produces a mask.
Proceedings ArticleDOI

Continuous Speech Separation: Dataset and Analysis

TL;DR: A new real recording dataset, called LibriCSS, is derived from LibriSpeech by concatenating the corpus utterances to simulate conversations and capturing the audio replays with far-field microphones, which helps researchers from developing systems that can be readily applied to real scenarios.
Proceedings ArticleDOI

Single Channel Target Speaker Extraction and Recognition with Speaker Beam

TL;DR: This paper addresses the problem of single channel speech recognition of a target speaker in a mixture of speech signals by exploiting auxiliary speaker information provided by an adaptation utterance from the target speaker to extract and recognize only that speaker.
Journal ArticleDOI

SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

TL;DR: This paper introduces SpeakerBeam, a method for extracting a target speaker from the mixture based on an adaptation utterance spoken by the target speaker and shows the benefit of including speaker information in the processing and the effectiveness of the proposed method.
Proceedings ArticleDOI

Target-Speaker Voice Activity Detection: A Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario.

TL;DR: A novel Target-Speaker Voice Activity Detection (TS-VAD) approach, which directly predicts an activity of each speaker on each time frame, outperforming the baseline x-vector-based system by more than 30% Diarization Error Rate (DER) abs.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Book

Independent Component Analysis

TL;DR: Independent component analysis as mentioned in this paper is a statistical generative model based on sparse coding, which is basically a proper probabilistic formulation of the ideas underpinning sparse coding and can be interpreted as providing a Bayesian prior.
Journal ArticleDOI

Image method for efficiently simulating small‐room acoustics

TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.
Journal ArticleDOI

Performance measurement in blind audio source separation

TL;DR: This paper considers four different sets of allowed distortions in blind audio source separation algorithms, from time-invariant gains to time-varying filters, and derives a global performance measure using an energy ratio, plus a separate performance measure for each error term.
Journal ArticleDOI

Evaluation of Objective Quality Measures for Speech Enhancement

TL;DR: The evaluation of correlations of several objective measures with these three subjective rating scales is reported on and several new composite objective measures are also proposed by combining the individual objective measures using nonparametric and parametric regression analysis techniques.
Related Papers (5)
Trending Questions (1)
How can I make a speaker without an amplifier?

Experiments on mixture of two speakers demonstrate that the proposed scheme can track and extract a target speaker for both closed and open speaker set cases.