Universal speech models for speaker independent single channel source separation

doi:10.1109/ICASSP.2013.6637625

Open AccessProceedings ArticleDOI

Universal speech models for speaker independent single channel source separation

- pp 141-145

TLDR

This work proposes a method to learn a universal speech model from a general corpus of speech and shows how to use this model to separate speech from other sound sources and shows that this method improves performance when training data of the non-speech source is available.

Abstract:

Supervised and semi-supervised source separation algorithms based on non-negative matrix factorization have been shown to be quite effective. However, they require isolated training examples of one or more sources, which is often difficult to obtain. This limits the practical applicability of these algorithms. We examine the problem of efficiently utilizing general training data in the absence of specific training examples. Specifically, we propose a method to learn a universal speech model from a general corpus of speech and show how to use this model to separate speech from other sound sources. This model is used in lieu of a speech model trained on speaker-dependent training examples, and thus circumvents the aforementioned problem. Our experimental results show that our method achieves nearly the same performance as when speaker-dependent training examples are used. Furthermore, we show that our method improves performance when training data of the non-speech source is available.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Adversarial Semi-Supervised Audio Source Separation Applied to Singing Voice Extraction

Daniel Stoller, +2 more

TL;DR: This work adopts adversarial training for music source separation with the aim of driving the separator towards outputs deemed as realistic by discriminator networks that are trained to tell apart real from separator samples.

...read moreread less

Proceedings ArticleDOI

Speaker and noise independent voice activity detection.

Francois G. Germain, +2 more

TL;DR: This paper proposes a VAD method based on non-negative matrix factorization that is robust to a variety of non-stationary noises mixed at a wide range of signal-to-noise ratios and significantly outperforms baseline algorithms.

...read moreread less

Book ChapterDOI

Single-channel audio source separation with NMF: divergences, constraints and algorithms

Cédric Févotte, +2 more

TL;DR: The standard majorisation-minimisation strategy to address optimisation for NMF with the common \(\beta \)-divergence is presented, a family of measures of fit that takes the quadratic cost, the generalised Kullback-Leibler divergence and the Itakura-Saito divergence as special cases.

...read moreread less

Proceedings ArticleDOI

Speech enhancement by sparse, low-rank, and dictionary spectrogram decomposition

Zhuo Chen, +1 more

TL;DR: A novel speech enhancement system based on decomposing the spectrogram into sparse activation of a dictionary of target speech templates, and a low-rank background model, which makes few assumptions about the noise other than its limited spectral variation is proposed.

...read moreread less

Journal ArticleDOI

Mixtures of Local Dictionaries for Unsupervised Speech Enhancement

Minje Kim, +1 more

- 01 Mar 2015 -

IEEE Signal Processing Letters

TL;DR: The proposed Mixture of Local Dictionaries (MLD) outperforms the state of the art technology by up to 2 dB in signal-to-distortion ratio, especially in the unsupervised environment where neither the speaker identity nor the type of noise is known in advance.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

The Elements of Statistical Learning

Trevor Hastie, +2 more

Journal ArticleDOI

The Elements of Statistical Learning

Eric R. Ziegel

- 01 Aug 2003 -

Technometrics

TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.

...read moreread less

Journal ArticleDOI

Model selection and estimation in regression with grouped variables

Ming Yuan, +1 more

- 01 Feb 2006 -

Journal of The Royal Statistical Society...

TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.

...read moreread less

Journal ArticleDOI

Enhancing Sparsity by Reweighted ℓ 1 Minimization

Emmanuel J. Candès, +2 more

- 15 Oct 2008 -

Journal of Fourier Analysis and Applicat...

TL;DR: A novel method for sparse signal recovery that in many situations outperforms ℓ1 minimization in the sense that substantially fewer measurements are needed for exact recovery.

...read moreread less

Journal ArticleDOI

Speaker Verification Using Adapted Gaussian Mixture Models

Douglas A. Reynolds, +2 more

- 01 Jan 2000 -

Digital Signal Processing

TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.

...read moreread less