scispace - formally typeset
Open AccessJournal ArticleDOI

Single-Channel Signal Separation Using Spectral Basis Correlation with Sparse Nonnegative Tensor Factorization

TLDR
A novel approach for solving the single-channel signal separation is presented the proposed sparse nonnegative tensor factorization under the framework of maximum a posteriori probability and adaptively fine-tuned using the hierarchical Bayesian approach with a new mixing mixture model.
Abstract
A novel approach for solving the single-channel signal separation is presented the proposed sparse nonnegative tensor factorization under the framework of maximum a posteriori probability and adaptively fine-tuned using the hierarchical Bayesian approach with a new mixing mixture model. The mixing mixture is an analogy of a stereo signal concept given by one real and the other virtual microphones. An “imitated-stereo” mixture model is thus developed by weighting and time-shifting the original single-channel mixture. This leads to an artificial mixing system of dual channels which gives rise to a new form of spectral basis correlation diversity of the sources. Underlying all factorization algorithms is the principal difficulty in estimating the adequate number of latent components for each signal. This paper addresses these issues by developing a framework for pruning unnecessary components and incorporating a modified multivariate rectified Gaussian prior information into the spectral basis features. The parameters of the imitated-stereo model are estimated via the proposed sparse nonnegative tensor factorization with Itakura–Saito divergence. In addition, the separability conditions of the proposed mixture model are derived and demonstrated that the proposed method can separate real-time captured mixtures. Experimental testing on real audio sources has been conducted to verify the capability of the proposed method.

read more

Content maybe subject to copyright    Report


 !"#$%&%
$% $' ( $'%  % ) $' * 
+,  $-  $%  ./ !# '' 01/2&0/!2 3$$
1/& /!4
%-$'
( '555! ! 15 .6& !"& !!02&6 7'555! ! 15 .6& !"&
!!02&68
 * ) )% 9   
'55%55'5."/2.5
(*-*%'#%
(*-:''-;%9
-*%#5'-)$%'
99%%''%-'9*'-
99'%-%&9&'<''
) ' '   '*   %  9%% %'
%*)%%-'%5(%'
-)-+%%%%%--
9)9%'9'-% 9%%'%-
*%%%'55%5'%%
-=9<%'%*9
*%%%)'%'%59
'%* 9'% *'%:) '
->#

1
P. Parathai
1
, N. Tengtrairat
2
, W. L. Woo
3
, and Bin Gao
4
Abstract -- A novel approach for solving the single-channel signal separation (SCSS) is presented the
proposed sparse nonnegative tensor factorization under the framework of maximum a posteriori
probability and adaptively fine-tuned using the hierarchical Bayesian approach with a new mixing mixture
model. The mixing mixture is an analogy of a stereo signal concept given by one real and the other virtual
microphones. An “imitated-stereo” mixture model is thus developed by weighting and time-shifting the
original single-channel mixture. This leads to an artificial mixing system of dual channels which gives rise
to a new form of spectral basis correlation diversity of the sources. Underlying all factorization algorithms
is the principal difficulty in estimating the adequate number of latent components for each signal. This
paper addresses these issues by developing a framework for pruning unnecessary components and
incorporating a modified multivariate rectified Gaussian prior information into the spectral basis features.
The parameters of the imitated stereo model are estimated via the proposed sparse nonnegative tensor
factorization with Itakura-Saito divergence. In addition, the separability conditions of the proposed
mixture model are derived and demonstrated that the proposed method can separate real-time captured
mixtures. Experimental testing on real-audio sources has been conducted to verify the capability of the
proposed method.
Keywords Blind source separation, underdetermined mixture, tensor factorization, unsupervised
learning, multiplicative updates, source modeling.
1
School of Software Engineering, Payap University, Chiang Mai, Thailand: phetcharat@payap.ac.th
2
School of Software Engineering, Payap University, Chiang Mai, Thailand: naruephorn_t@payap.ac.th
3
Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, England, United Kingdom:
wai.l.woo@northumbia.ac.uk
4
School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China: bin_gao@uestc.edu.cn
Single-Channel Signal Separation using Spectral Basis
Correlation with Sparse Nonnegative Tensor Factorization

2
1 INTRODUCTION
Blind source separation (BSS) [29, 47] is the process of separating individual source signals without using the
training information of the sources. BSS is flourishing in numerous fields, including underwater signal processing
[31], communication [27], speech enhancement [37], biomedical [14] and audio signal recognitions [42]. One
classical problem of BSS is the so-called "cocktail party problem" [4] is psychoacoustic phenomenon that
indicates to the significant human capability to attend and recognize the speaker from the interference
environment. An extreme case of BSS is termed as single channel blind source separation (SCBSS). The SCBSS
aims to discover individual source signals from a single mixture recording without any a priori information of the
sources. Since the number of source signals
󰇝
󰇛
󰇜
󰇞

is greater than the number of the observed
mixture
󰇛
󰇜
, this is known as the underdetermined SCBSS problem [2, 12, 20, 30, 33, 44]. Many algorithms have
been successfully developed for SCBSS. The conventional ICA method [19] was adapted to the case of SCBSS
which is known as single-channel ICA (SCICA). In [1, 21, 28, 40], a SCICA method is proposed which maps an
observed single-channel mixture into a multi-channel model by breaking the observed vector into a sequence of
contiguous blocks. These blocks are treated as a matrix where the standard ICA can then be employed to estimate
the underlying sources. Generally, it has two major drawbacks of the SCICA method: first, the algorithm assumes
stationary sources; and second, the sources are assumed to be disjoint in the frequency domain. These assumptions
however do not always hold in applications. In the SCICA method, the sources are modeled as sparse combination
of a set of time-domain basis functions which are initially derived using the standard ICA. This method renders
optimal separation when the ICA basis functions corresponding to each source have minimal time-domain
overlap. In the case where the basis functions have significant overlap with each other e.g. mixture of two speech
sources or the basis functions of two sources are very similar, the method performs poorly. In [46], a
single-channel mixture was applied multi-component radar or signal-dependent transforms [10, 32] to generate a
multi-channel mixture. The generated multi-mixtures are subsequently separated by ICA. Another approach is
decomposing a signal of interest into different sources is nonnegative matrix factorization (NMF) approach [24].
The NMF has been used for sound source separation of single-channel mixtures using the multiplicative update
(MU) algorithm to solve its parametrical optimization based on the least square distance and Kullback-Leibler
divergence as cost function in [25, 34, 35]. Later, other families of cost functions were continuously proposed for
example the Beta divergence [22], Csiszár’s divergences [5], and Itakura-Saito divergence [7]. Popular method in
this category is the sparse non-negative matrix factorization (SNMF) [15] where sparseness constraints can be
included into the cost function. The two-dimensional sparse NMF deconvolution (SNMF2D) [3, 11] uses a double

3
convolution to model both spreading of spectral basis and variation of temporal structure inherent in the signals. In
[23], sources are assumed to be non-stationary and nonnegative. The canonical tensor and least squares method is
used to estimate the mixing model. The source is then discovered by a minimum mean-squared error beamformer
approach without any hypothetical limitation. On a parallel development, NTF under a parallel factor analysis
(PARAFAC) structure where the channel spectrograms are jointly modeled by a 3-valence tensor have been
introduced in [8, 36]. Clustering of the spatial cues to group the NTF components (cNTF) is developed in [6] for
multichannel audio source separation. In most applications, if the number of components () is too small, the data
does not fit the model well. Conversely, if is too large, then overfitting occurs. Choosing the right model is in
particular challenging in the PARAFAC model as the number of components is specified for each modality
separately. While these approaches increase the accuracy of matrix factorization, it only works when large sample
dataset is available. However, the sparsity parameter is manually determined. This will then cause over or under
sparsity that effect to separation performance. To find an elegant solution for this dichotomy between data fidelity
and overfitting, it is crucial that the “right” model order of components is selected.
In this paper, a new framework for single-channel blind source separation (SCBSS) is proposed. The proposed
solution separates sources from a single-channel without relying on training information about the original
sources. The advantages of the proposed method are: 1) Analogous to the stereo signal concept given by one
microphone. We create an imitated-stereo mixture from a single-channel mixture signal. From this stereo mixture
the proposed algorithm can be employed to separate individual source from the mixtures. 2) Overcoming the
limitations associated with the above NTF problems. Unlike the NTF, our model assigns a probability distribution
to each element of unknown non-negative matrix 

, where , , and
are an activation coefficient,
audio components, time slots, respectively, and a sparsity parameter associated with each probability distribution.
This sets up a platform to enable the sparsity parameter to be individually optimized for each element code. 3)
Automatically detecting the optimal number of components of the individual source (i.e.
,  where
is the maximum number of sources). It designates a prior distribution on and determines the desirable
in an
unknown basis by pruning the irrelevant
from . The term with the proper
is used for estimating the
source which renders the better separation performance than without the proper
. 4) Incorporating prior
information of the basis vectors using the modified multivariate rectified Gaussian. This benefits the overall
algorithm in terms of better estimation accuracy and more meaningful feature extraction that pertain to the data.
Since each pattern in Y has its own features, designing the appropriate basis to match these features is imperative.

4
If these features share some degree of correlation, then this information should be captured to enable better
part-based representations of each feature. Toward this end, we develop a modified Gaussian prior distribution on
 to allow the proposed matrix factorization to capture the features of these patterns more efficiently. As our
proposed method assigns a regularization parameter to each temporal code (which is individually optimized and
adaptively tuned to yield the optimal sparse factorization) this Bayesian regularization improves the accuracy in
resolving the spectral bases and the temporal codes which were previously not possible by using cNTF alone. This
takes the advantage of the combination of the automatic detection of the optimal
through both the pruning
technique and the prior information on . This results in the separation performance that surpasses the
conventional cNTF.
The paper is organized as follows. Section 2 introduces the imitated-stereo” mixture model along with the
assumptions of the proposed method. The proposed demixing method and the formulation of the NTF algorithm
are presented in Section 3. The separability of the mixture model is presented in Section 4. Experimental source
separation results on musical data coupled with a series of performance comparison with other SCBSS techniques
using the datasets from Real World Computing (RWC) [13] music database and the 2016 Signal Separation
Evaluation Campaign (SiSEC) [39] are presented in Section 5. We finally conclude the paper in Section 6.
2 SINGLE CHANNEL MIXING MODEL
A. Imitated-Stereo Mixture Model
The single-channel blind source separation problem can be expressed as
󰇛
󰇜
󰇛
󰇜
󰇛
󰇜
󰇛
󰇜
(1)
where
󰇛
󰇜
is the single channel observed mixture,
󰇛
󰇜
denotes the th source signal,
, is the total number
of source signals and  denotes the time index. To discover the original signals
󰇛
󰇜
given only by
the sole observed mixture
󰇛
󰇜
, we compose another mixture based on the autoregressive (AR) process of the
sources. Most of audio signals can be modeled by the AR process. This enables us to propose the imitated
mixture by time-shifting and weighting the observed mixture as
󰇛
󰇜

󰇡
󰇛
󰇜

󰇛
󰇜
󰇢

󰇡
󰇛
󰇜
󰇛
󰇜

󰇛
󰇜
󰇛
󰇜
󰇢 (2)

Citations
More filters
Journal ArticleDOI

A Recursive Least-Squares Algorithm for the Identification of Trilinear Forms

TL;DR: This paper designs a recursive least-squares (RLS) algorithm tailored for the identification of trilinear forms, namely RLS-TF, which outperforms the conventional RLS algorithm, but also the previously developed tril inear counterparts based on the least-mean- squares algorithm.
Journal ArticleDOI

Automated Landslide-Risk Prediction Using Web GIS and Machine Learning Models

TL;DR: In this article, a novel geographic information web (GIW) application is proposed for dynamically predicting landslide risk in Chiang Rai, Thailand, which is coordinated between machine learning technologies, web technologies, and application programming interfaces (APIs).
Journal ArticleDOI

Efficient Noisy Sound-Event Mixture Classification Using Adaptive-Sparse Complex-Valued Matrix Factorization and OvsO SVM.

TL;DR: The traditional complex nonnegative matrix factorization (CMF) is extended by cooperation with the optimal adaptive L1 sparsity to decompose a noisy single-channel mixture and outperformed the state-of-the-art methods.
Journal ArticleDOI

Target Signal Extraction Method Based on Enhanced ICA with Reference

TL;DR: The theoretical analysis and simulation experiment show that the proposed EICA-R algorithm overcomes the problem of the error extraction of the existing algorithm and improves the reliability of the target signal extraction.
References
More filters
Journal ArticleDOI

Learning the parts of objects by non-negative matrix factorization

TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.

Learning parts of objects by non-negative matrix factorization

D. D. Lee
TL;DR: In this article, non-negative matrix factorization is used to learn parts of faces and semantic features of text, which is in contrast to principal components analysis and vector quantization that learn holistic, not parts-based, representations.
Proceedings Article

Algorithms for Non-negative Matrix Factorization

TL;DR: Two different multiplicative algorithms for non-negative matrix factorization are analyzed and one algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence.
Journal ArticleDOI

Fast and robust fixed-point algorithms for independent component analysis

TL;DR: Using maximum entropy approximations of differential entropy, a family of new contrast (objective) functions for ICA enable both the estimation of the whole decomposition by minimizing mutual information, and estimation of individual independent components as projection pursuit directions.
Journal ArticleDOI

Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values†

TL;DR: In this paper, a new variant of Factor Analysis (PMF) is described, where the problem is solved in the weighted least squares sense: G and F are determined so that the Frobenius norm of E divided (element-by-element) by σ is minimized.
Related Papers (5)
Frequently Asked Questions (11)
Q1. What are the future works in this paper?

Therefore, in the future work, the performance improvement of the proposed method is aimed to reduce the computational time. 

A novel approach for solving the single-channel signal separation ( SCSS ) is presented the proposed sparse nonnegative tensor factorization under the framework of maximum a posteriori probability and adaptively fine-tuned using the hierarchical Bayesian approach with a new mixing mixture model. This paper addresses these issues by developing a framework for pruning unnecessary components and incorporating a modified multivariate rectified Gaussian prior information into the spectral basis features. 

the proposed method can automatically detect the optimal number of components of the individual source, thus leading to more robust separation results among the comparison methods. 

Due to, the proposed method performs iterative parameters updating and computes the nonnegative matrix decomposition given by two imitated channels. 

As their proposed method assigns a regularization parameter to each temporal code (which is individually optimized and adaptively tuned to yield the optimal sparse factorization) this Bayesian regularization improves the accuracy in resolving the spectral bases and the temporal codes which were previously not possible by using cNTF alone. 

Their novelty of the artificial-stereo mixture has been the emergence of a new diversity in the form of sources’ temporal correlation within the context of SCBSS. 

The proposed imitated-stereo method yields an outstanding performance over the DUET, SNMF2D, EMD-ICA, SCICA, and Hilbert-SD with a total average improvement 5.82 dB per source. 

The proposed method aims to estimate the original signals [ ( ) ( ) ( )] by formulating an imitatedstereo mixture and using the proposed method given only one observed mixture, ( ). 

In terms of percentage, the average performance improvement of the proposed method against the comparison methods are 92.9%, 140.3%, 242.1%, 497.0% and 311.1%, respectively.nodrums(bass/lead G /rhythmic G)Proposed method 8.85 31.85 8.84DUET 5.19 14.71 5.43 SNMF2D 4.45 12.15 6.13 EMD-ICA 2.79 14.12 1.97SCICA 1.43 13.50 2.57 Hilbert-SD 3.62 13.04 5.22ShannonsongsSunrise(drum/vocal/piano)Proposed method 3.79 12.83 3.85DUET 

The single-channel blind source separation problem can be expressed as( ) ( ) ( ) ( ) (1)where ( ) is the single channel observed mixture, ( ) denotes the th source signal, , is the total numberof source signals and denotes the time index. 

In [46], a single-channel mixture was applied multi-component radar or signal-dependent transforms [10, 32] to generate a multi-channel mixture.