A General Flexible Framework for the Handling of Prior Information in Audio Source Separation
read more
Citations
Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis
Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR
Towards Scaling Up Classification-Based Speech Separation
A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures
References
Maximum likelihood from incomplete data via the EM algorithm
A tutorial on hidden Markov models and selected applications in speech recognition
Performance measurement in blind audio source separation
Non-negative Matrix Factorization with Sparseness Constraints
Related Papers (5)
Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis
Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation
Frequently Asked Questions (14)
Q2. What have the authors stated for future works in "A general flexible framework for the handling of prior information in audio source separation" ?
As for further research, the following extensions could be introduced to the framework. Html spectral power, a flexible structure can be specified for the mixing parameters.
Q3. What is the initialization algorithm for instantaneous mixtures?
Rj × F ×N subject to the time-invariance constraint and subject to the frequency invariance constraint for instantaneous mixtures only.
Q4. Why do the authors update parameters in a joint manner?
since the authors can here update parameters jointly without loss of flexibility, the authors do so, since joint optimization, as compared to the alternated one, leads in general to a faster convergence.
Q5. How many iterations of the proposed GEM algorithm did the results of the analysis be measured?
After 500 iterations of the proposed GEM algorithm the separation results, measured in terms of the source to distortion ratio (SDR) [48], were 7.2 and 8.9 dB for voice and guitar, respectively.
Q6. what is the spectral power of a glottal source?
The spectral power, denoted as Vj , is assumed to be the product of an excitation spectral power Vexj , representing, e.g., the excitation of the glottal source for voice or the plucking of the string of a guitar, and a filter spectral power Vftj , representing, e.g., the vocal tract or the impedance of the guitar body [23], [35].
Q7. What is the EM algorithm update rule for qexj?
An EM algorithm update rules for time pattern weights Gexj or Gftj with time continuity priors, such as inverse-Gamma or Gamma Markov chain priors, can be found in [9].
Q8. What is the simplest way to denote the mixing parameters?
If the mixing parameters are given some Gaussian priors, closed-form updates similar to (26), (27) can be still derived, since the modified log-posterior (18) will be a quadratic form with respect to the mixing parameters.
Q9. What is the e-step for estimating the model parameters?
2. First, given initial parameter values, the model parameters θ are estimated from the mixture X using an iterative GEM algorithm, where the E-step consists in computing some quantity
Q10. how many parameters are used to define the spectral patterns?
In order to further constrain the fine structure of the spectral patterns, they are represented as linear combinations of narrowband spectral patterns Wexj [14] with weights U ex j .
Q11. What are the spectral patterns of the bass and drums?
The narrowband spectral patterns W ex j (j = 9, . . . , 12) include 3 × L harmonic patterns modeling the harmonic part of L pitches (see [14]).
Q12. What is the spectral power of the mixing parameters?
the time-varying mixing parameters could be represented in terms of time-localized and locally time-invariant mixing parameter patterns, thus allowing the modeling of moving sources.
Q13. What are the main assumptions of the local Gaussian model-based framework?
While the local Gaussian model-based framework offers maximum of flexibility, there exist some methods that do not satisfy (fully or partially) the aforementioned assumptions and are thus not strictly covered by the framework.
Q14. How can the authors implement rank-1 adaptive spatial timeinvariances?
This structure can be implemented in their framework by choosing rank-1 adaptive spatial timeinvariant covariances, i.e., Aj is an adaptive tensor of size 2 × 1 × F × N subject to the time-invariance constraint, and constraining the spectral power to Vj = Wexj G ex j H ex j5 with Wexj being the F × F identity matrix, Gex a F × ⌈N/L⌉ adaptive matrix, and Hexj the ⌈N/L⌉ × N fixed matrix with entries hexj,mn = 1 for n ∈ Lm and h ex j,mn = 0 for n /∈