What are the future works in this paper?

Therefore, in the future work, the performance improvement of the proposed method is aimed to reduce the computational time.

What is the effect of the proposed method on the performance of the audio sources?

the proposed method can automatically detect the optimal number of components of the individual source, thus leading to more robust separation results among the comparison methods.

Why is the proposed method the complex?

Due to, the proposed method performs iterative parameters updating and computes the nonnegative matrix decomposition given by two imitated channels.

What is the novelty of the artificial-stereo mixture?

Their novelty of the artificial-stereo mixture has been the emergence of a new diversity in the form of sources’ temporal correlation within the context of SCBSS.

What is the performance of the proposed method?

The proposed imitated-stereo method yields an outstanding performance over the DUET, SNMF2D, EMD-ICA, SCICA, and Hilbert-SD with a total average improvement 5.82 dB per source.

What is the purpose of the proposed method?

The proposed method aims to estimate the original signals [ ( ) ( ) ( )] by formulating an imitatedstereo mixture and using the proposed method given only one observed mixture, ( ).

What is the performance improvement of the proposed method?

In terms of percentage, the average performance improvement of the proposed method against the comparison methods are 92.9%, 140.3%, 242.1%, 497.0% and 311.1%, respectively.nodrums(bass/lead G /rhythmic G)Proposed method 8.85 31.85 8.84DUET 5.19 14.71 5.43 SNMF2D 4.45 12.15 6.13 EMD-ICA 2.79 14.12 1.97SCICA 1.43 13.50 2.57 Hilbert-SD 3.62 13.04 5.22ShannonsongsSunrise(drum/vocal/piano)Proposed method 3.79 12.83 3.85DUET

(Open Access) Single-Channel Signal Separation Using Spectral Basis Correlation with Sparse Nonnegative Tensor Factorization (2019) | Phetcharat Parathai

Q: What is the proposed method for resolving the spectral bases and the temporal?

As their proposed method assigns a regularization parameter to each temporal code (which is individually optimized and adaptively tuned to yield the optimal sparse factorization) this Bayesian regularization improves the accuracy in resolving the spectral bases and the temporal codes which were previously not possible by using cNTF alone.



 !"#$%&%

$%  $'  (  $'%    %  )  $'  *  

+,    $-    $%    ./ !#  ''  01/2&0/!2  3$$

1/& /!4

%-$'

(  '555! ! 15 .6& !"& !!02&6  7'555! ! 15 .6& !"&

!!02&68

  *  )  )%  9      

'55%55'5."/2.5

(*-*%'#%

(*-:''-;%9

-*%#5'-)$%'

99%%''%-'9*'-

99'%-%&9&'<''

)  '  '      '*      %    9%%  %'

%*)%%-'%5(%'

-)-+%%%%%--

9)9%'9'-% 9%%'%-

*%%%'55%5'%%

-=9<%'%*9

*%%%)'%'%59

'%*  9'%  *'%:) '

->#

P. Parathai

, N. Tengtrairat

, W. L. Woo

, and Bin Gao

Abstract -- A novel approach for solving the single-channel signal separation (SCSS) is presented the

proposed sparse nonnegative tensor factorization under the framework of maximum a posteriori

probability and adaptively fine-tuned using the hierarchical Bayesian approach with a new mixing mixture

model. The mixing mixture is an analogy of a stereo signal concept given by one real and the other virtual

microphones. An “imitated-stereo” mixture model is thus developed by weighting and time-shifting the

original single-channel mixture. This leads to an artificial mixing system of dual channels which gives rise

to a new form of spectral basis correlation diversity of the sources. Underlying all factorization algorithms

is the principal difficulty in estimating the adequate number of latent components for each signal. This

paper addresses these issues by developing a framework for pruning unnecessary components and

incorporating a modified multivariate rectified Gaussian prior information into the spectral basis features.

The parameters of the imitated stereo model are estimated via the proposed sparse nonnegative tensor

factorization with Itakura-Saito divergence. In addition, the separability conditions of the proposed

mixture model are derived and demonstrated that the proposed method can separate real-time captured

mixtures. Experimental testing on real-audio sources has been conducted to verify the capability of the

proposed method.

Keywords — Blind source separation, underdetermined mixture, tensor factorization, unsupervised

learning, multiplicative updates, source modeling.

School of Software Engineering, Payap University, Chiang Mai, Thailand: phetcharat@payap.ac.th

School of Software Engineering, Payap University, Chiang Mai, Thailand: naruephorn_t@payap.ac.th

Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, England, United Kingdom:

wai.l.woo@northumbia.ac.uk

School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China: bin_gao@uestc.edu.cn

Single-Channel Signal Separation using Spectral Basis

Correlation with Sparse Nonnegative Tensor Factorization

1 INTRODUCTION

Blind source separation (BSS) [29, 47] is the process of separating individual source signals without using the

training information of the sources. BSS is flourishing in numerous fields, including underwater signal processing

[31], communication [27], speech enhancement [37], biomedical [14] and audio signal recognitions [42]. One

classical problem of BSS is the so-called "cocktail party problem" [4] is psychoacoustic phenomenon that

indicates to the significant human capability to attend and recognize the speaker from the interference

environment. An extreme case of BSS is termed as single channel blind source separation (SCBSS). The SCBSS

aims to discover individual source signals from a single mixture recording without any a priori information of the

sources. Since the number of source signals

󰇝





󰇛



󰇜

󰇞



is greater than the number of the observed

mixture

󰇛



󰇜

, this is known as the underdetermined SCBSS problem [2, 12, 20, 30, 33, 44]. Many algorithms have

been successfully developed for SCBSS. The conventional ICA method [19] was adapted to the case of SCBSS

which is known as single-channel ICA (SCICA). In [1, 21, 28, 40], a SCICA method is proposed which maps an

observed single-channel mixture into a multi-channel model by breaking the observed vector into a sequence of

contiguous blocks. These blocks are treated as a matrix where the standard ICA can then be employed to estimate

the underlying sources. Generally, it has two major drawbacks of the SCICA method: first, the algorithm assumes

stationary sources; and second, the sources are assumed to be disjoint in the frequency domain. These assumptions

however do not always hold in applications. In the SCICA method, the sources are modeled as sparse combination

of a set of time-domain basis functions which are initially derived using the standard ICA. This method renders

optimal separation when the ICA basis functions corresponding to each source have minimal time-domain

overlap. In the case where the basis functions have significant overlap with each other e.g. mixture of two speech

sources or the basis functions of two sources are very similar, the method performs poorly. In [46], a

single-channel mixture was applied multi-component radar or signal-dependent transforms [10, 32] to generate a

multi-channel mixture. The generated multi-mixtures are subsequently separated by ICA. Another approach is

decomposing a signal of interest into different sources is nonnegative matrix factorization (NMF) approach [24].

The NMF has been used for sound source separation of single-channel mixtures using the multiplicative update

(MU) algorithm to solve its parametrical optimization based on the least square distance and Kullback-Leibler

divergence as cost function in [25, 34, 35]. Later, other families of cost functions were continuously proposed for

example the Beta divergence [22], Csiszár’s divergences [5], and Itakura-Saito divergence [7]. Popular method in

this category is the sparse non-negative matrix factorization (SNMF) [15] where sparseness constraints can be

included into the cost function. The two-dimensional sparse NMF deconvolution (SNMF2D) [3, 11] uses a double

convolution to model both spreading of spectral basis and variation of temporal structure inherent in the signals. In

[23], sources are assumed to be non-stationary and nonnegative. The canonical tensor and least squares method is

used to estimate the mixing model. The source is then discovered by a minimum mean-squared error beamformer

approach without any hypothetical limitation. On a parallel development, NTF under a parallel factor analysis

(PARAFAC) structure where the channel spectrograms are jointly modeled by a 3-valence tensor have been

introduced in [8, 36]. Clustering of the spatial cues to group the NTF components (cNTF) is developed in [6] for

multichannel audio source separation. In most applications, if the number of components () is too small, the data

does not fit the model well. Conversely, if  is too large, then overfitting occurs. Choosing the right model is in

particular challenging in the PARAFAC model as the number of components is specified for each modality

separately. While these approaches increase the accuracy of matrix factorization, it only works when large sample

dataset is available. However, the sparsity parameter is manually determined. This will then cause over or under

sparsity that effect to separation performance. To find an elegant solution for this dichotomy between data fidelity

and overfitting, it is crucial that the “right” model order of components is selected.

In this paper, a new framework for single-channel blind source separation (SCBSS) is proposed. The proposed

solution separates sources from a single-channel without relying on training information about the original

sources. The advantages of the proposed method are: 1) Analogous to the stereo signal concept given by one

microphone. We create an imitated-stereo mixture from a single-channel mixture signal. From this stereo mixture

the proposed algorithm can be employed to separate individual source from the mixtures. 2) Overcoming the

limitations associated with the above NTF problems. Unlike the NTF, our model assigns a probability distribution

to each element of unknown non-negative matrix 





, where , , and 



are an activation coefficient,

audio components, time slots, respectively, and a sparsity parameter associated with each probability distribution.

This sets up a platform to enable the sparsity parameter to be individually optimized for each element code. 3)

Automatically detecting the optimal number of components  of the individual source (i.e. 



,  where

 is the maximum number of sources). It designates a prior distribution on  and determines the desirable 



in an

unknown basis  by pruning the irrelevant 



from . The term  with the proper 



is used for estimating the

source which renders the better separation performance than  without the proper 



. 4) Incorporating prior

information of the basis vectors using the modified multivariate rectified Gaussian. This benefits the overall

algorithm in terms of better estimation accuracy and more meaningful feature extraction that pertain to the data.

Since each pattern in Y has its own features, designing the appropriate basis to match these features is imperative.

If these features share some degree of correlation, then this information should be captured to enable better

part-based representations of each feature. Toward this end, we develop a modified Gaussian prior distribution on

 to allow the proposed matrix factorization to capture the features of these patterns more efficiently. As our

proposed method assigns a regularization parameter to each temporal code (which is individually optimized and

adaptively tuned to yield the optimal sparse factorization) this Bayesian regularization improves the accuracy in

resolving the spectral bases and the temporal codes which were previously not possible by using cNTF alone. This

takes the advantage of the combination of the automatic detection of the optimal 



through both the pruning

technique and the prior information on . This results in the separation performance that surpasses the

conventional cNTF.

The paper is organized as follows. Section 2 introduces the “imitated-stereo” mixture model along with the

assumptions of the proposed method. The proposed demixing method and the formulation of the NTF algorithm

are presented in Section 3. The separability of the mixture model is presented in Section 4. Experimental source

separation results on musical data coupled with a series of performance comparison with other SCBSS techniques

using the datasets from Real World Computing (RWC) [13] music database and the 2016 Signal Separation

Evaluation Campaign (SiSEC) [39] are presented in Section 5. We finally conclude the paper in Section 6.

2 SINGLE CHANNEL MIXING MODEL

A. Imitated-Stereo Mixture Model

The single-channel blind source separation problem can be expressed as





󰇛



󰇜





󰇛



󰇜

 



󰇛



󰇜

  





󰇛



󰇜

(1)

where 



󰇛



󰇜

is the single channel observed mixture, 



󰇛



󰇜

denotes the th source signal, 



, is the total number

of source signals and  denotes the time index. To discover the original signals 



󰇛



󰇜

given only by

the sole observed mixture 



󰇛



󰇜

, we compose another mixture based on the autoregressive (AR) process of the

sources. Most of audio signals can be modeled by the AR process. This enables us to propose the imitated

mixture by time-shifting and weighting the observed mixture as





󰇛



󰇜













󰇡



󰇛



󰇜

 



󰇛

 

󰇜

󰇢













󰇡



󰇛



󰇜

 



󰇛



󰇜

 



󰇛

 

󰇜

 



󰇛

 

󰇜

󰇢 (2)

Single-Channel Signal Separation Using Spectral Basis Correlation with Sparse Nonnegative Tensor Factorization

Figures

Citations

The Journal of the Acoustical Society of America

A Recursive Least-Squares Algorithm for the Identification of Trilinear Forms

Automated Landslide-Risk Prediction Using Web GIS and Machine Learning Models

Efficient Noisy Sound-Event Mixture Classification Using Adaptive-Sparse Complex-Valued Matrix Factorization and OvsO SVM.

Target Signal Extraction Method Based on Enhanced ICA with Reference

References

Learning the parts of objects by non-negative matrix factorization

Learning parts of objects by non-negative matrix factorization

Algorithms for Non-negative Matrix Factorization

Fast and robust fixed-point algorithms for independent component analysis

Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values†

Related Papers (5)

Adaptive Sparsity Non-Negative Matrix Factorization for Single-Channel Source Separation

Variational Regularized 2-D Nonnegative Matrix Factorization

Itakura-Saito nonnegative matrix factorization with group sparsity

Bayesian group sparse learning for music source separation

Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model

Frequently Asked Questions (11)

Q1. What are the future works in this paper?

Q2. What are the contributions in this paper?

Q3. What is the effect of the proposed method on the performance of the audio sources?

Q4. Why is the proposed method the complex?

Q5. What is the proposed method for resolving the spectral bases and the temporal?

Q6. What is the novelty of the artificial-stereo mixture?

Q7. What is the performance of the proposed method?

Q8. What is the purpose of the proposed method?

Q9. What is the performance improvement of the proposed method?

Q10. What is the simplest way to describe the blind source separation problem?

Q11. What was the common method of generating a multi-channel mixture?