Proceedings ArticleDOI
Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures.
Kateřina Žmolíková,Marc Delcroix,Keisuke Kinoshita,Takuya Higuchi,Atsunori Ogawa,Tomohiro Nakatani +5 more
- pp 2655-2659
Reads0
Chats0
TLDR
This work uses a neural network to estimate masks to extract the target speaker and derive beamformer filters using these masks, in a similar way as the recently proposed approach for extraction of speech in presence of noise.Abstract:
In this work, we address the problem of extracting one target speaker from a multichannel mixture of speech. We use a neural network to estimate masks to extract the target speaker and derive beamformer filters using these masks, in a similar way as the recently proposed approach for extraction of speech in presence of noise. To overcome the permutation ambiguity of neural network mask estimation, which arises in presence of multiple speakers, we propose to inform the neural network about the target speaker so that it learns to follow the speaker characteristics through the utterance. We investigate and compare different methods of passing the speaker information to the network such as making one layer of the network dependent on speaker characteristics. Experiments on mixture of two speakers demonstrate that the proposed scheme can track and extract a target speaker for both closed and open speaker set cases.read more
Citations
More filters
Posted Content
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking.
Quan Wang,Hannah Muckenhirn,Kevin W. Wilson,Prashant Sridhar,Zelin Wu,John R. Hershey,Rif A. Saurous,Ron Weiss,Ye Jia,Ignacio Lopez Moreno +9 more
TL;DR: In this paper, a speaker recognition network that produces speaker-discriminative embeddings and a spectrogram masking network that takes both noisy spectrogram and speaker embedding as input, and produces a mask.
Proceedings ArticleDOI
Continuous Speech Separation: Dataset and Analysis
Zhuo Chen,Takuya Yoshioka,Liang Lu,Tianyan Zhou,Zhong Meng,Yi Luo,Jian Wu,Xiong Xiao,Jinyu Li +8 more
TL;DR: A new real recording dataset, called LibriCSS, is derived from LibriSpeech by concatenating the corpus utterances to simulate conversations and capturing the audio replays with far-field microphones, which helps researchers from developing systems that can be readily applied to real scenarios.
Proceedings ArticleDOI
Single Channel Target Speaker Extraction and Recognition with Speaker Beam
TL;DR: This paper addresses the problem of single channel speech recognition of a target speaker in a mixture of speech signals by exploiting auxiliary speaker information provided by an adaptation utterance from the target speaker to extract and recognize only that speaker.
Journal ArticleDOI
SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
Katerina Zmolikova,Marc Delcroix,Keisuke Kinoshita,Tsubasa Ochiai,Tomohiro Nakatani,Lukas Burget,Jan Cernocky +6 more
TL;DR: This paper introduces SpeakerBeam, a method for extracting a target speaker from the mixture based on an adaptation utterance spoken by the target speaker and shows the benefit of including speaker information in the processing and the effectiveness of the proposed method.
Proceedings ArticleDOI
Target-Speaker Voice Activity Detection: A Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario.
Ivan Medennikov,Maxim Korenevsky,Tatiana Prisyach,Yuri Y. Khokhlov,Mariya Korenevskaya,Ivan Sorokin,Tatiana Timofeeva,Anton Mitrofanov,Andrei Andrusenko,Ivan Podluzhny,Aleksandr Laptev,Aleksei Romanenko +11 more
TL;DR: A novel Target-Speaker Voice Activity Detection (TS-VAD) approach, which directly predicts an activity of each speaker on each time frame, outperforming the baseline x-vector-based system by more than 30% Diarization Error Rate (DER) abs.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Book
Independent Component Analysis
TL;DR: Independent component analysis as mentioned in this paper is a statistical generative model based on sparse coding, which is basically a proper probabilistic formulation of the ideas underpinning sparse coding and can be interpreted as providing a Bayesian prior.
Journal ArticleDOI
Image method for efficiently simulating small‐room acoustics
Jont B. Allen,David A. Berkley +1 more
TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.
Journal ArticleDOI
Performance measurement in blind audio source separation
TL;DR: This paper considers four different sets of allowed distortions in blind audio source separation algorithms, from time-invariant gains to time-varying filters, and derives a global performance measure using an energy ratio, plus a separate performance measure for each error term.
Journal ArticleDOI
Evaluation of Objective Quality Measures for Speech Enhancement
Yi Hu,Philipos C. Loizou +1 more
TL;DR: The evaluation of correlations of several objective measures with these three subjective rating scales is reported on and several new composite objective measures are also proposed by combining the individual objective measures using nonparametric and parametric regression analysis techniques.