VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

doi:10.21437/INTERSPEECH.2019-1101

Open AccessProceedings ArticleDOI

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

Hannah Muckenhirn, +9 more

- pp 2728-2732

Chats0

TLDR

A novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker, by training two separate neural networks.

About:

This article is published in Conference of the International Speech Communication Association.The article was published on 2019-09-15 and is currently open access. It has received 149 citations till now. The article focuses on the topics: Masking (art) & Spectrogram.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Fully Supervised Speaker Diarization

Aonan Zhang, +4 more

TL;DR: A fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN), given extracted speaker-discriminative embeddings, which decodes in an online fashion while most state-of-the-art systems rely on offline clustering.

...read moreread less

Proceedings ArticleDOI

Continuous Speech Separation: Dataset and Analysis

Zhuo Chen, +8 more

TL;DR: A new real recording dataset, called LibriCSS, is derived from LibriSpeech by concatenating the corpus utterances to simulate conversations and capturing the audio replays with far-field microphones, which helps researchers from developing systems that can be readily applied to real scenarios.

...read moreread less

Posted Content

Wavesplit: End-to-End Speech Separation by Speaker Clustering

Neil Zeghidour, +1 more

- 20 Feb 2020 -

arXiv: Audio and Speech Processing

TL;DR: Wavesplit redefines the state-of-the-art on clean mixtures of 2 or 3 speakers, as well as in noisy and reverberated settings, and set a new benchmark on the recent LibriMix dataset.

...read moreread less

Posted Content

Voice Separation with an Unknown Number of Multiple Speakers

Eliya Nachmani, +2 more

- 29 Feb 2020 -

arXiv: Audio and Speech Processing

TL;DR: A new method is presented for separating a mixed audio sequence, in which multiple voices speak simultaneously, that greatly outperforms the current state of the art, which, as it is shown, is not competitive for more than two speakers.

...read moreread less

Posted Content

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

Daniel Michelsanti, +6 more

- 21 Aug 2020 -

arXiv: Audio and Speech Processing

TL;DR: This paper provides a systematic survey of this research topic, focusing on the main elements that characterise the systems in the literature: acoustic features; visual features; deep learning methods; fusion techniques; training targets; and objective functions.

...read moreread less