VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
Hannah Muckenhirn,Ignacio Lopez Moreno,John R. Hershey,Kevin W. Wilson,Prashant Sridhar,Quan Wang,Rif A. Saurous,Ron Weiss,Ye Jia,Zelin Wu +9 more
- pp 2728-2732
Reads0
Chats0
TLDR
A novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker, by training two separate neural networks.About:
This article is published in Conference of the International Speech Communication Association.The article was published on 2019-09-15 and is currently open access. It has received 149 citations till now. The article focuses on the topics: Masking (art) & Spectrogram.read more
Citations
More filters
Proceedings ArticleDOI
Fully Supervised Speaker Diarization
TL;DR: A fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN), given extracted speaker-discriminative embeddings, which decodes in an online fashion while most state-of-the-art systems rely on offline clustering.
Proceedings ArticleDOI
Continuous Speech Separation: Dataset and Analysis
Zhuo Chen,Takuya Yoshioka,Liang Lu,Tianyan Zhou,Zhong Meng,Yi Luo,Jian Wu,Xiong Xiao,Jinyu Li +8 more
TL;DR: A new real recording dataset, called LibriCSS, is derived from LibriSpeech by concatenating the corpus utterances to simulate conversations and capturing the audio replays with far-field microphones, which helps researchers from developing systems that can be readily applied to real scenarios.
Posted Content
Wavesplit: End-to-End Speech Separation by Speaker Clustering
Neil Zeghidour,David Grangier +1 more
TL;DR: Wavesplit redefines the state-of-the-art on clean mixtures of 2 or 3 speakers, as well as in noisy and reverberated settings, and set a new benchmark on the recent LibriMix dataset.
Posted Content
Voice Separation with an Unknown Number of Multiple Speakers
TL;DR: A new method is presented for separating a mixed audio sequence, in which multiple voices speak simultaneously, that greatly outperforms the current state of the art, which, as it is shown, is not competitive for more than two speakers.
Posted Content
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
TL;DR: This paper provides a systematic survey of this research topic, focusing on the main elements that characterise the systems in the literature: acoustic features; visual features; deep learning methods; fusion techniques; training targets; and objective functions.