scispace - formally typeset
Open AccessProceedings ArticleDOI

Universal speech models for speaker independent single channel source separation

TLDR
This work proposes a method to learn a universal speech model from a general corpus of speech and shows how to use this model to separate speech from other sound sources and shows that this method improves performance when training data of the non-speech source is available.
Abstract
Supervised and semi-supervised source separation algorithms based on non-negative matrix factorization have been shown to be quite effective. However, they require isolated training examples of one or more sources, which is often difficult to obtain. This limits the practical applicability of these algorithms. We examine the problem of efficiently utilizing general training data in the absence of specific training examples. Specifically, we propose a method to learn a universal speech model from a general corpus of speech and show how to use this model to separate speech from other sound sources. This model is used in lieu of a speech model trained on speaker-dependent training examples, and thus circumvents the aforementioned problem. Our experimental results show that our method achieves nearly the same performance as when speaker-dependent training examples are used. Furthermore, we show that our method improves performance when training data of the non-speech source is available.

read more

Citations
More filters
Proceedings ArticleDOI

Adversarial Semi-Supervised Audio Source Separation Applied to Singing Voice Extraction

TL;DR: This work adopts adversarial training for music source separation with the aim of driving the separator towards outputs deemed as realistic by discriminator networks that are trained to tell apart real from separator samples.
Proceedings ArticleDOI

Speaker and noise independent voice activity detection.

TL;DR: This paper proposes a VAD method based on non-negative matrix factorization that is robust to a variety of non-stationary noises mixed at a wide range of signal-to-noise ratios and significantly outperforms baseline algorithms.
Book ChapterDOI

Single-channel audio source separation with NMF: divergences, constraints and algorithms

TL;DR: The standard majorisation-minimisation strategy to address optimisation for NMF with the common \(\beta \)-divergence is presented, a family of measures of fit that takes the quadratic cost, the generalised Kullback-Leibler divergence and the Itakura-Saito divergence as special cases.
Proceedings ArticleDOI

Speech enhancement by sparse, low-rank, and dictionary spectrogram decomposition

TL;DR: A novel speech enhancement system based on decomposing the spectrogram into sparse activation of a dictionary of target speech templates, and a low-rank background model, which makes few assumptions about the noise other than its limited spectral variation is proposed.
Journal ArticleDOI

Mixtures of Local Dictionaries for Unsupervised Speech Enhancement

TL;DR: The proposed Mixture of Local Dictionaries (MLD) outperforms the state of the art technology by up to 2 dB in signal-to-distortion ratio, especially in the unsupervised environment where neither the speaker identity nor the type of noise is known in advance.
References
More filters
Journal ArticleDOI

The Elements of Statistical Learning

Eric R. Ziegel
- 01 Aug 2003 - 
TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.
Journal ArticleDOI

Model selection and estimation in regression with grouped variables

TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.
Journal ArticleDOI

Enhancing Sparsity by Reweighted ℓ 1 Minimization

TL;DR: A novel method for sparse signal recovery that in many situations outperforms ℓ1 minimization in the sense that substantially fewer measurements are needed for exact recovery.
Journal ArticleDOI

Speaker Verification Using Adapted Gaussian Mixture Models

TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.
Related Papers (5)