scispace - formally typeset

Proceedings ArticleDOI

A compressive sensing based compressed neural network for sound source localization

15 Jun 2011-pp 6-10

TL;DR: A family of new algorithms for compression of NNs is presented based on Compressive Sampling (CS) theory, which makes it possible to find a sparse structure for NNs, and then the designed neural network is compressed by using CS.

AbstractMicrophone arrays are today employed to specify the sound source locations in numerous real time applications such as speech processing in large rooms or acoustic echo cancellation. Signal sources may exist in the near field or far field with respect to the microphones. Current Neural Networks (NNs) based source localization approaches assume far field narrowband sources. One of the important limitations of these NN-based approaches is making balance between computational complexity and the development of NNs; an architecture that is too large or too small will affect the performance in terms of generalization and computational cost. In the previous analysis, saliency subject has been employed to determine the most suitable structure, however, it is time-consuming and the performance is not robust. In this paper, a family of new algorithms for compression of NNs is presented based on Compressive Sampling (CS) theory. The proposed framework makes it possible to find a sparse structure for NNs, and then the designed neural network is compressed by using CS. The key difference between our algorithm and the state-of-the-art techniques is that the mapping is continuously done using the most effective features; therefore, the proposed method has a fast convergence. The empirical work demonstrates that the proposed algorithm is an effective alternative to traditional methods in terms of accuracy and computational complexity.

Summary (2 min read)

INTRODUCTION

  • In the sound source localization techniques, location of the source has to be estimated automatically by calculating the direction of the received signal [1] .
  • Feature extraction is the process of selection of the useful data for estimation of DOA.
  • The important key insight is the use of the instantaneous crosspower spectrum at each pair of sensors.
  • After this step the authors have compressed the neural network that is designed with these feature vectors.
  • The next section presents a review of techniques for sound source localization.

II. SOUND SOURCE LOCALIZATION

  • The assumption of far field sources remains true while the distance between source and reference microphone is larger than [2] fig.
  • And D is the microphone array length.
  • So, the time delay of the received signal between the reference microphone and the − ℎ microphone would be [15] : EQUATION.
  • Therefore, is the amount of time that the signal traverses the distance between any two neighboring microphones, Fig. 1 EQUATION where, r is the distance between source and the first microphone [15] .

III. FEATURE SELECTION

  • The aim of this section is to compute the feature vectors from the array data and use the MLP (Multi Layer Perceptron) approximation property to map the feature vectors to the corresponding DOA, as shown in Fig. 3 [6] .
  • The authors summarized their algorithm for computing a real-valued feature vector of length (2( − 1) + 1) , for dominant frequencies and M sensors below: Preprocessing algorithm for computing a real-valued feature vector: 1. Calculate the -point FFT of the signal at each sensor.
  • In conclusion, their purpose is to design a neural network with least number of hidden neurons (or weights) that has the minimum increase in error given by‖ − ‖.
  • This problem is equivalent to finding which most of its rows are zeros.
  • Comparing these equations with (7) the authors can conclude that these minimization problems can be written as CS problems.

VI. RESULTS AND DISCUSSION

  • As mentioned before, assuming that the received speech signals are modeled with 10 dominant frequencies, the authors have trained a two layer Perceptron neural network with 128 neurons in hidden layer and trained it with feature vectors that are obtained with CS from the cross-power spectrum of the received microphone signals.
  • After computing network weights the authors tried to compress network with their algorithms.
  • With these outputs the authors can infer that CS algorithms are faster than other algorithms and have smaller error in compare with other algorithms.
  • This means that, According to the number of Measurement vectors, the algorithm that uses single-measurement vector (SMV) is faster than another algorithm that uses multiple-measurement vector (MMV) but its achieve error is not smaller.

VII. CONCLUSION

  • Particularly, using the pursuit and greedy methods in CS, a compressing methods for NNs has been presented.
  • The key difference between their algorithm and previous techniques is that the authors focus on the remaining elements of neural networks; their method has a quick convergence.
  • The simulation results, demonstrates that their algorithm is an effective alternative to traditional methods in terms of accuracy and computational complexity.
  • Results revealed this fact that the proposed algorithm could decrease the computational complexity while the performance is increased.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

A Compressive Sensing Based Compressed Neural
Network for Sound Source Localization
Mehdi Banitalebi Dehkordi
Speech Processing Research Lab
Elec. and Comp. Eng. Dept.,
Yazd University
Yazd, Iran
mahdi_Banitalebi@stu.yazduni.ac.ir
Hamid Reza Abutalebi
Elec. and Comp. Eng. Dept.,
Yazd University,
Yazd, Iran &
Idiap Research Institute,
Martigny, Switzerland
habutalebi@yazduni.ac.ir
Hossein Ghanei
Elec. and Comp. Eng. Dept.,
Yazd University,
Yazd, Iran
hghaneiy@yazduni.ac.ir
Abstract —Microphone arrays are today employed to specify the
sound source locations in numerous real time applications such
as speech processing in large rooms or acoustic echo cancellation.
Signal sources may exist in the near field or far field with respect
to the microphones. Current Neural Networks (NNs) based
source localization approaches assume far field narrowband
sources. One of the important limitations of these NN-based
approaches is making balance between computational complexity
and the development of NNs; an architecture that is too large or
too small will affect the performance in terms of generalization
and computational cost. In the previous analysis, saliency subject
has been employed to determine the most suitable structure,
however, it is time-consuming and the performance is not robust.
In this paper, a family of new algorithms for compression of NNs
is presented based on Compressive Sampling (CS) theory. The
proposed framework makes it possible to find a sparse structure
for NNs, and then the designed neural network is compressed by
using CS. The key difference between our algorithm and the
state-of-the-art techniques is that the mapping is continuously
done using the most effective features; therefore, the proposed
method has a fast convergence. The empirical work demonstrates
that the proposed algorithm is an effective alternative to
traditional methods in terms of accuracy and computational
complexity.
Keywords- compressive sampling; sound source; neural
network; pruning; multilayer Perceptron; greedy algorithms.
I. INTRODUCTION
Location of a sound source is an important piece of
information in speech signal processing applications. In the
sound source localization techniques, location of the source has
to be estimated automatically by calculating the direction of the
received signal [1]. Most algorithms for these calculations are
computationally intensive and difficult for real time
implementation [2]. Neural network based techniques have
been proposed to overcome the computational complexity
problem by exploiting their massive parallelism [3,4]. These
techniques usually assume narrowband far field source signal,
which is not always applicable [2].
In this paper, we design a system that estimates the
direction-of-arrival (DOA) (direction of received signal) for far
field and near field wide band sources. The proposed system
uses feature extraction followed by a neural network. Feature
extraction is the process of selection of the useful data for
estimation of DOA. The estimation is performed by the use CS.
The neural network, which performs the pattern recognition
step, computes the DOA to locate the sound source. The
important key insight is the use of the instantaneous cross-
power spectrum at each pair of sensors. Instantaneous cross-
power spectrum means the cross-power spectrum calculated
without any averaging over realizations. This step calculates
the discrete Fourier transform (DFT) of the signals at all
sensors. In the compressive sampling step K coefficients of this
DFT transforms are selected, and then multiplies the DFT
coefficients at these selected frequencies using the complex
conjugate of the coefficients in the neighboring sensors. In
comparison to the other cross-power spectrum estimation
techniques (which multiply each pair of DFT coefficients and
average the results), we have reduced the computational
complexity. After this step we have compressed the neural
network that is designed with these feature vectors. We
propose a family of new algorithms based on CS to achieve
this. The main advantage of this framework is that these
algorithms are capable of iteratively building up the sparse
topology, while maintaining the training accuracy of the
original larger architecture. Experimental and simulation
results showed that by use of NNs and CS we can design a
compressed neural network for locating the sound source with
acceptable accuracy.
The remainder of the paper is organized as follows. The
next section presents a review of techniques for sound source
localization. Section III explains feature selection and discusses
the training and testing procedures of our sound source
localization technique. Section IV describes traditional pruning
algorithms and compressive sampling theory and section V
contains the details of the new network pruning approach by
describing the link between pruning NNs and CS and the
introduction two definitions for different sparse matrices.
Experimental results are illustrated in Section VI while VII
concludes the paper.
II. S
OUND SOURCE LOCALIZATION
Sound source localization is performed by the use of DOA.
The assumption of far field sources remains true while the

distance between source and reference microphone is larger
than


[2] fig. 1. In this equation

is the minimum
wavelength of the source signal, and D is the microphone array
length. With this condition, incoming waves are approximately
planar. So, the time delay of the received signal between the
reference microphone and the − microphone would be
[15]:
=
(
−1
)

=(1)
(1)
In (1) is the distance between two microphones, is the
DOA, and is the velocity of sound in air. Therefore,
is the
amount of time that the signal traverses the distance between
any two neighboring microphones, Fig. 1 and 2 illustrates this
fact.
ϕ
)(ts
)(
0
tts +
)2(
0
tts +
)3(
0
tts +
ϕ
ϕ
ϕ
Figure 1. Estimation of far-field source location
ϕ
)(ts
)(
0
tts +
)2(
0
tts +
)3(
0
tts +
Figure 2. Estimation of near-field source location
If the distances between source and microphones are not far
enough, then time delay of the received signal between the
reference microphone and the − microphone would be
[15] fig. 2:
=


(

)
(
(

)
)
(2)
where, r is the distance between source and the first (reference)
microphone [15].
III. F
EATURE SELECTION
The aim of this section is to compute the feature vectors
from the array data and use the MLP (Multi Layer Perceptron)
approximation property to map the feature vectors to the
corresponding DOA, as shown in Fig. 3 [6].
Figure 3. Multilayer Perceptron neural network for sound source
localization.
Feature vector must:
1. be able to be mapped to the desired output (DOA).
2. be independent in phase, frequency, bandwidth, and
amplitude of the source.
3. be able to be calculated computationally efficient.
Assume that
() is the signal received at the −
microphone and =1 is the reference microphone (
=0).
We can write the signal at the − microphone in terms of
the signal at the first microphone signal as follow:
(
)
=
(
+
)
⟹
(
)
=
()

(3)
Then the cross-power spectrum between sensor and
sensor +1 like below:
,
(
)
=
(
)

(
)
=
|
()
|
(


)
(4)
The normalized version is:
,
(
)
=

(


)
(5)
This equation suggests that there exists a projection from
and
,
and
to
(for =1,2,,N) and thus to the
DOA. Therefore our aim is to use an MLP neural network to
approximate this mapping.
We summarized our algorithm for computing a real-valued
feature vector of length (2(1)+1), for dominant
frequencies and M sensors below:
Preprocessing algorithm for computing a real-valued feature
vector:
1. Calculate the -point FFT of the signal at each
sensor.
2. For =1,2,,1
2.1. Find the FFT coefficients in absolute value
for sensor with compressive sampling.

W
trai
n
[7].
F
sign
a
ran
d
sign
a
whe
r
the
f
the
mic
r
whi
t
[20
0
[0,
vect
o
inp
u
In t
r
feat
u
dec
r
rece
i
esti
m
erro
r
rela
t
rela
t
IV
G
thre
e
eac
h
2.2.
M
w
t
t
s
2.3.
N
a
3. Cons
and
coef
f
W
e utilized
n
ed it accordi
n
F
or training
n
a
ls. We mod
e
d
om frequenc
i
a
l at sensor
(
)
=

r
e is the n
u
f
requency of
− cosin
e
r
ophone (
=
t
e Gaussian
n
0
,2000
2
]. We gen
e
or
, and then
ut
-output
p
airs
r
aining step,
a
u
re vectors,
w
r
ease feature
v
i
ved sample
d
m
ate DOA o
f
r
s in classi
fi
t
ion with nu
m
t
ions for far fi
e
Figure 4. Rel
a
. T
RADITIO
N
G
enerally sp
e
e
sub-
p
roced
u
h
element in t
h
M
ultiply the
w
ith the conj
u
t
he same indi
c
t
he instantane
o
s
pectrum.
N
ormalize all
a
bsolute valu
e
truct a featu
r
imaginary
p
f
icient and the
i
two-layer P
e
n
g to fast bac
k
n
etwork we us
e
e
led received
i
es and phas
e
as below:
cos(2
u
mber of cosi
n
the − c
o
e
, t
is the ti
m
=
1) and th
e
n
oise, and
] and
e
rate 100 in
d
calculate fe
a
are used to tr
a
a
fter making
l
w
e use com
p
v
ectors dime
n
d
signal, we
f
sound sour
c
fi
cation and
m
ber of hidd
e
e
ld and near
fi
a
tion
b
etween nu
m
N
AL
P
RUNIN
G
e
aking, netw
o
u
res: (i) defin
h
e network; (
i
FFT coeffi
c
u
gate of the
F
c
es for sensor
o
us estimate
o
the estimate
s
e
s.
r
e vector tha
t
p
art of cros
s
i
r correspond
i
e
rceptron ne
u
k
propagation
e
a simulated
signal as a s
u
e
s. We write
−
−2
n
es (we assu
m
o
sine,
is t
h
m
e dela
y
b
et
w
e
− mi
is uniforml
y
is uniforml
y
d
ependent set
s
a
ture vectors.
a
in the MLP.
l
earning data
s
p
ressive sam
p
n
sion. In testi
n
calculate fe
a
c
e. Our expe
r
in approxim
a
e
n neurons.
F
fi
eld sources.
m
ber of hidden n
G
A
LGORITHM
S
o
rk pruning
i
e and quanti
f
i
i) eliminate t
h
c
ient for sens
o
F
FT coeffici
e
+1 to cal
c
o
f the cross-
p
s
by dividing
t
contains th
e
s
-
p
ower spe
c
i
ng FFT indic
e
u
ral network
training algo
r
dataset of rec
e
u
m of cosines
received sa
m
)+[]
m
ed =10),
h
e initial pha
w
een the refe
r
crophone,
[
y
distribute
d
y
distributed
s
of 128 sa
m
A total of
s
et and calcu
l
p
ling algorith
m
n
g step for a
a
ture vecto
r
s
r
iments sho
w
a
tion have
d
F
ig.4 shows
t
eurons and erro
r
S
AND
CS
T
H
E
i
s often cas
t
e
f
y the salienc
y
h
e least signi
f
o
r
e
nt at
c
ulate
ower
there
e
real
c
trum
e
s.
and
r
ithm
e
ive
d
with
m
pled
(6)
is
se of
r
ence
]
is
over
over
m
pled
3600
l
ating
m
to
new
and
w
that
d
irect
t
hese
E
ORY
ed
as
y
for
f
icant
ele
m
kno
w
1
sig
n
2
min
i
3
M
cate
(O
B
Ma
g
p
ru
n
unit
Tes
t
A
rec
e
dim
e
stro
n
the
o
gua
r
so
m
A
p
ro
b
or
M
is e
x
and
the
a
I
of n
S
this
Ort
h
and
[16]
B
co
m
[11,
1
)
tha
n
2
)
a
m
W
mat
r
,
t
net
w
exp
a
m
ents; (iii)
r
w
ledge, the f
o
1
) What is t
h
n
ificance of el
e
2
) How to
e
i
mal increase
3
) How to ma
k
M
ethods for
c
gories: 1) w
e
B
D) [10],
O
g
nitude-
b
ased
n
ing, e.g.: S
k
s (NC) [10]
a
t
(EFAST) [2]
A
new theor
y
e
ntly emerge
d
e
nsionality
r
n
gly model-
b
o
ry states tha
t
r
antees the s
u
m
e conditions
f
A
ccording to
b
lem can be s
o
M
ultiple-Mea
s
x
pressed as f
o
a dictionary
a
toms), we se
e
I
n above equ
a
on-zero coef
fi
S
everal iterat
i
minimizati
o
h
ogonal Matc
h
Non-convex
.
V. P
ROB
B
efore we fo
r
m
pressive sam
p
10]:
)
If for all
c
n
S, then this
m
)
If the nu
m
m
atrix was s
m
 ma
t
W
e assume t
h
r
ix , and the
t
hen the mat
h
w
ork can be
a
nsion:
min
−
r
e-adjust the
o
llowing ques
t
h
e best criter
i
e
ments?
e
liminate th
o
in error?
k
e the metho
d
c
ompressing
N
e
ight prunin
g
O
ptimal Brai
n
pruning (M
A
k
eletonization
a
nd Extende
d
.
y
known as
d
that can al
s
r
eduction. Li
k
ased (relying
t
for a given
u
ccess of
r
ec
o
f
rom a small
n
the number
o
o
r
t
ed into Sin
g
s
urement Vec
t
o
llows. Given
∈
∗
(th
e
e
k a vector so
min

a
tion
(k
n
fi
cient of .
i
ve algorith
m
o
n problem
h
ing Pursuit (
O
local optimi
z
LEM
F
ORMUL
A
r
mulate the p
r
p
ling proble
m
c
olumns of a
m
atrix called
a
m
ber of rows t
h
m
aller than
t
rix.
h
at the traini
n
desired outp
u
h
ematical m
o
extrac
t
ed i
n
..
remaining
t
t
ions may app
i
on to descri
b
o
se unimport
a
d
converge as
N
Ns can be
c
g
, e.g.: Opti
m
n Surgeon
A
G) [12]. An
d
(SKEL) [8]
d
Fourier A
m
Compressed
s
o be catego
r
k
e manifold
on sparsity
i
degree of re
o
vering the
g
n
umber of sa
m
o
f measurem
e
g
le-Measure
m
t
or (MMV).
T
a measurem
e
e
columns of
D
lution satis
f
..=
n
own as
n
o
s have been
(Greedy Al
g
O
MP) or Mat
c
z
ation like F
A
TION AND
M
r
oblem of ne
t
m
we introdu
c
matrix,
a
−
h
at contain n
o
then this
m
n
g input patte
r
u
t patterns ar
e
o
del for trai
n
n
the form
=
(
=
(
t
opology. By
ear in min
d
:
b
e the salien
c
a
nt ele
m
ents
fast as possib
l
c
lassified int
o
m
al Brain Da
m
(OBS) [4],
d
2) hidden n
e
, non-contri
b
m
plitude Sens
i
Sensing (CS
)
r
ized as a ty
p
learning,
C
i
n particular).
sidual erro
r
g
iven signal
u
m
ples [14].
e
nt vectors, t
h
m
ent Vector (
S
T
he SMV pr
o
e
nt sample
D
are referre
d
f
ying:
o
r
m
), is the n
u
proposed to
g
orithms suc
h
c
hing Pursuit
O
CUSS algo
r
M
ETHODOLOG
Y
t
work prunin
g
e some defin
i
 was s
m
matrix.
o
nzero eleme
n
m
atrix is cal
l
r
ns are sto
r
e
d
e
stored in a
m
n
ing of the
n
of the foll
o
+
)
+
)
this
c
y, or
with
l
e?
o
two
m
age
and
e
uron
b
uting
i
tivity
)
has
p
e of
C
S is
This
, CS
u
nder
h
e CS
S
MV)
o
blem
d
to as
(7)
u
mber
solve
h
as
(MP)
r
ithm
Y
g
as a
i
tions
m
aller
n
ts in
l
ed a
d
in a
m
atrix
n
eural
o
wing
(8)

where
is the output matrix of neural network,
is the
output matrix of hidden layer,
,
are weight matrix of two
layers and
,
are bias terms.
In conclusion, our purpose is to design a neural network
with least number of hidden neurons (or weights) that has the
minimum increase in error given by
−
. When we
minimize a weight matrix (
or
), the behavior acts like
setting, in mathematical viewpoint, the relating elements in
or
to zero. Deduction from above shows that the goal of
finding the smallest number of weights in NNs within a range
of accuracy can consider to be equal to finding an S
−sparse
Matrix
or
. So we can write problem as below:
󰇫

[
(
)
]
..
=
(
+
)

[
(
)
]
..
=
(
(
(
+
)
)+
)
(9)
This problem is equivalent to finding
which most of its
rows are zeros. So with definition of S
−sparse matrix we
can rewrite the problem as below:

[
(
)
]
..
=
(
(
(
+
)
)+
)(10)
In matrix form equation (9) and (10) can be written as:
󰇫

[
(
)]
..[

(
)]
=(
)
(
)

[
(
)]
..[

(
)]
=(
)
(
)
(11)

[
(
)]
..[

(
)]
=(
)
(
)
(12)
In which
is input matrix of the hidden layer for the
compressed neural network. Comparing these equations with
(7) we can conclude that these minimization problems can be
written as CS problems. In these CS equations (
)
, (
)
and (
)
was used as the dictionary matrixes and
(
)
and
(
)
are playing the role of the signal matrix. The process of
compressing NNs can be regarded as finding different sparse
solutions for weight matrix
(
)
or
(
)
.
VI. R
ESULTS AND DISCUSSION
As mentioned before, assuming that the received speech
signals are modeled with 10 dominant frequencies, we have
trained a two layer Perceptron neural network with 128
neurons in hidden layer and trained it with feature vectors that
are obtained with CS from the cross-power spectrum of the
received microphone signals. After computing network weights
we tried to compress network with our algorithms.
In order to compare our results with the previous
algorithms we have use SNNS (SNNS is a simulator for NNs
which is available at [19]). All of the traditional algorithms,
such as Optimal Brain Damage (OBD) [16], Optimal Brain
Surgeon (OBS) [17], and Magnitude-based pruning (MAG)
[18], Skeletonization (SKEL) [6], non-contributing units (NC)
[7] and Extended Fourier Amplitude Sensitivity Test
(EFAST)
[13], are available in SNNS (CSS1 is name of
algorithm that uses SMV for sparse representation and CSS2 is
another technique that uses MMV for sparse representation).
Table I and II demonstrate the results of the simulations.
Observing these results, in table I we compare algorithms on
classification problem and in table II we compare algorithms
on approximation problem. For classification problem we
compare sum of hidden neurons weights in different algorithms
with similar stopping rule in training neural networks. Another
thing that we compared in this table was classification error
and time of training epochs. In table II we compare number of
hidden neurons and error in approximation and time of training
epochs, where we have stopping rule in training neural
networks. With these outputs we can infer that CS algorithms
are faster than other algorithms and have smaller error in
compare with other algorithms. In comparison to other
algorithms CSS1 is faster than CSS2 and would achieve
smaller computational complexity. This means that, According
to the number of Measurement vectors, the algorithm that uses
single-measurement vector (SMV) is faster than another
algorithm that uses multiple-measurement vector (MMV) but
its achieve error is not smaller.
TABLE I. COMPARISON FOR DIFFERENT ALGORITHMS
(CLASSIFICATION)
Training epochs=50 MAG OBS OBD CSS1
Sum of neurons
weights
3261 3109 2401 780
classification erro
r
(s) 0.0537 0.0591 0.046 0.0043
Training epochs
time(s)
0.62 25.64 23.09 0.41
TABLE II. COMPARISON FOR DIFFERENT ALGORITHMS
(APPROXIMATION)
Training epochs=50 NC SKETL EFAST CSS2
Hidden neurons 127 128 7 6
Approximation erro
r
(s) 0.094 0.081 0.016 0.0023
Training epochs Time(s) 27.87 7.86 9.97 14.87
VII. C
ONCLUSION
In this paper, compressive sampling is utilized to designing
NNs. Particularly, using the pursuit and greedy methods in CS,
a compressing methods for NNs has been presented. The key
difference between our algorithm and previous techniques is
that we focus on the remaining elements of neural networks;
our method has a quick convergence. The simulation results,
demonstrates that our algorithm is an effective alternative to
traditional methods in terms of accuracy and computational
complexity. Results revealed this fact that the proposed
algorithm could decrease the computational complexity while
the performance is increased.
R
EFERENCES
[1] R. Reed, "Pruning algorithms-a survey", IEEE Transactions on Neural
Networks, vol. 4, pp. 740-747, May, 1993.
[2] P. Lauret, E. Fock, T. A. Mara, "A node pruning algorithm based on a
Fourier amplitude sensitivity test method", IEEE Transactions on
Neural
Networks, vol. 17, pp. 273-293, March, 2006.
[3] B. Hassibi and D. G. Stork, "Second-order derivatives for network
pruning: optimal brain surgeon," Advances in Neural Information
Processing
Systems, vol. 5, pp. 164-171, 1993.

[4] L. Preehcit, I. Proben, A set of neural networks benchmark problems and
benchmarking rules. University Karlsruher, Gerrnany, Teeh. Rep, 21/94,
2004.
[5] M. Hagiwara, "Removal of hidden units and weights for back
propagation networks," Proceeding IEEE International Joint Conference
on Neural Network, vol. 1, pp. 351-354, Aug. 2002.
[6] M. Mozer and P. Smolensky, "Skc lactonization: a technique for
trimming the fat from network via relevance assessme network,"
Advances in Neural
Information Processing Systems, vol. 1, pp. 107-
115, 1991.
[7] J. Sietsma and R. Dow, "Creating artificial neural networks that
generalize," Neural Networks, vol. 4, no. I, pp. 67-79, 1991.
[8] E. J. Candes, J. Rombcrg, T. Tao, "Robust uncertainty principles: exact
signal reconstruction from highly incomplete frequency information,"
IEEE Transactions on Information Theory, vol. 52, pp. 489-509, Jan.
2006.
[9] J. Haupt, R. Nowak, "Signal reconstruction from noisy random
projections," IEEE Transaction on Information Theory, vol. 52, pp.
4036-4048, Aug. 2006.
[10] Y. H. Liu, S. W. Luo, A. J. Li, "Information geometry on pruning of
neural network," International Conference on Machine Learning and
Cybernetics, Shanghai, Aug. 2004.
[11] J. Yang, A. Bouzerdoum, S. L. Phung, "A neural network pruning
approach based on compressive sampling," Proceedings of International
Joint Conference on Neural Networks, pp. 3428-3435, New Jersey,
USA, Jun. 2009.
[12] H. Rauhut, K. Sehnass, P. Vandergheynst, "Compressed sensing and
redundant dictionaries," IEEE Transaction on Information Theory, vol.
54, pp. 2210-2219, May. 2008.
[13] T. Xu and W. Wang, "A compressed sensing approach for
underdetermined blind audio source separation with sparse
representation," 2009.
[14] J. Laurent, P. Yand, and P. Vandergheynst, "Compressed sensing: when
sparsity meets sampling," Feb. 2010.
[15] G. Arslan, F. A. Sakarya, B. L. Evans, "Speaker localization for far field
and near field wideband sources using neural networks," Proc. IEEE-
EURASIP Workshop on Nonlinear Signal and Image Processing, vol. 2,
pp. 569-573, Antalya, Turkey, Jun. 1999.
[16] Y. Le. Cun, J. S. Denker, and S. A. Solla, "Optimal brain damage,"
Advance Neural Information. Process. Systems, vol. 2, pp. 598-605,
1990.
[17] B. Hassibi and D. G. Stork, "Second-order derivatives for network
pruning: optimal brain surgeon,” Advances in Neural Information
Processing Systems, vol. 5, pp. 164-171, 1993.
[18] M. Hagiwara, "Removal of hidden units and weights for back
propagation networks," Proc. IEEE Int. Joint Conf. Neural Network, pp.
351-354, 1993.
[19] SNNS software, available at http://www.ra.cs.unituebingen.de/SNNS/
.
Citations
More filters

Journal ArticleDOI
TL;DR: It is shown how DNN-based algorithm significantly outperforms the state-of-the-art approaches evaluated on the DIRHA dataset, providing an average localization error expressed in terms of Root Mean Square Error (RMSE), equal to 324 mm and 367 mm for the Simulated and the Real subsets.
Abstract: In the field of human speech capturing systems, a fundamental role is played by the source localization algorithms. In this paper a Speaker Localization algorithm (SLOC) based on Deep Neural Networks (DNN) is evaluated and compared with state-of-the art approaches. The speaker position in the room under analysis is directly determined by the DNN, leading the proposed algorithm to be fully data-driven. Two different neural network architectures are investigated: the Multi Layer Perceptron (MLP) and Convolutional Neural Networks (CNN). GCC-PHAT (Generalized Cross Correlation-PHAse Transform) Patterns, computed from the audio signals captured by the microphone are used as input features for the DNN. In particular, a multi-room case study is dealt with, where the acoustic scene of each room is influenced by sounds emitted in the other rooms. The algorithm is tested by means of the home recorded DIRHA dataset, characterized by multiple wall and ceiling microphone signals for each room. In detail, the focus goes to speaker localization task in two distinct neighboring rooms. As term of comparison, two algorithms proposed in literature for the addressed applicative context are evaluated, the Crosspower Spectrum Phase Speaker Localization (CSP-SLOC) and the Steered Response Power using the Phase Transform speaker localization (SRP-SLOC). Besides providing an extensive analysis of the proposed method, the article shows how DNN-based algorithm significantly outperforms the state-of-the-art approaches evaluated on the DIRHA dataset, providing an average localization error, expressed in terms of Root Mean Square Error (RMSE), equal to 324 mm and 367 mm, respectively, for the Simulated and the Real subsets.

25 citations


Journal ArticleDOI
TL;DR: A source localization algorithm based on a sparse Fast Fourier Transform-based feature extraction method and spatial sparsity which leads to a sparse representation of audio signals and a significant reduction in the dimensionality of the signals.
Abstract: In this paper, we propose a source localization algorithm based on a sparse Fast Fourier Transform (FFT)-based feature extraction method and spatial sparsity. We represent the sound source positions as a sparse vector by discretely segmenting the space with a circular grid. The location vector is related to microphone measurements through a linear equation, which can be estimated at each microphone. For this linear dimensionality reduction, we have utilized a Compressive Sensing (CS) and two-level FFT-based feature extraction method which combines two sets of audio signal features and covers both short-time and long-time properties of the signal. The proposed feature extraction method leads to a sparse representation of audio signals. As a result, a significant reduction in the dimensionality of the signals is achieved. In comparison to the state-of-the-art methods, the proposed method improves the accuracy while the complexity is reduced in some cases.

14 citations


Cites background or methods or result from "A compressive sensing based compres..."

  • ...In next step, to evaluate the proposed sound source localization system, we have compared its performance with those for two of the previously-reported CS-based target localization algorithms, namely DTL [8] and CSNN [9]....

    [...]

  • ...The Compressive Sensing-based Neural Network (CSNN) method [9] employs a neural network for the calculation of spectral feature vectors in each microphone....

    [...]

  • ...In [9], authors have tried to reduce computational complexity by employing a feature extraction process that selects useful data for estimation of DOA....

    [...]

  • ...Comparison between the localization performance of the proposed system, CSNN [9] and DTL algorithm [8] in the case of two sound sources and two microphones....

    [...]


Journal ArticleDOI
TL;DR: A new modeling and analysis framework for the multipatient positioning in a wireless body area network (WBAN) which exploits the spatial sparsity of patients and a sparse fast Fourier transform (FFT)-based feature extraction mechanism for monitoring of Patients and for reporting the movement tracking to a central database server containing patient vital information is presented.
Abstract: Recent achievements in wireless technologies have opened up enormous opportunities for the implementation of ubiquitous health care systems in providing rich contextual information and warning mechanisms against abnormal conditions. This helps with the automatic and remote monitoring/tracking of patients in hospitals and facilitates and with the supervision of fragile, elderly people in their own domestic environment through automatic systems to handle the remote drug delivery. This paper presents a new modeling and analysis framework for the multipatient positioning in a wireless body area network (WBAN) which exploits the spatial sparsity of patients and a sparse fast Fourier transform (FFT)-based feature extraction mechanism for monitoring of patients and for reporting the movement tracking to a central database server containing patient vital information. The main goal of this paper is to achieve a high degree of accuracy and resolution in the patient localization with less computational complexity in the implementation using the compressive sensing theory. We represent the patients' positions as a sparse vector obtained by the discrete segmentation of the patient movement space in a circular grid. To estimate this vector, a compressive-sampling-based two-level FFT (CS-2FFT) feature vector is synthesized for each received signal from the biosensors embedded on the patient's body at each grid point. This feature extraction process benefits in the combination of both short-time and long-time properties of the received signals. The robustness of the proposed CS-2FFT-based algorithm in terms of the average positioning error is numerically evaluated using the realistic parameters in the IEEE 802.15.6-WBAN standard in the presence of additive white Gaussian noise. Due to the circular grid pattern and the CS-2FFT feature extraction method, the proposed scheme represents a significant reduction in the computational complexity, while improving the level of the resolution and the localization accuracy when compared to some classical CS-based positioning algorithms.

9 citations


Cites methods from "A compressive sensing based compres..."

  • ...Localization performance of (a) the proposed scheme, (b) DTL algorithm in [10], and (c) CSNN algorithm in [23], for three patients and six receiver nodes....

    [...]

  • ...We compare the performance of the CS-2FFT-based scheme with that of two CS-based target localization algorithms, namely DTL [10] and CS-based neural network (CSNN) [23]....

    [...]

  • ...EML algorithms in [10], [23], and [24] in the case of three patients and six receiver nodes....

    [...]

  • ...pared to other classical positioning algorithms such as the EML, DTL, and CSNN approaches in [10] and [23]....

    [...]



Journal ArticleDOI
TL;DR: A novel sound source localization method based on compressive sensing theory that can directly determine the number of sound sources in one step and successfully estimate the source positions in noisy and reverberant environments is proposed.
Abstract: Sound source localization with less data is a challenging task. To address this problem, a novel sound source localization method based on compressive sensing theory is proposed in this paper. Specifically, a sparsity basis is first constructed for each microphone by shifting the audio signal recorded from one reference microphone. In this manner, the microphones except the reference one are allowed to capture audio signals under the sampling rate far below the Nyquist criterion. Next, the source positions are estimated by solving an $$l_1$$ minimization based on each frame of audio signals. Finally, a fine localization scheme is presented by fusing the estimated source positions from multiple frames. The proposed method can directly determine the number of sound sources in one step and successfully estimate the source positions in noisy and reverberant environments. Experimental results demonstrate the validity of the proposed method.

References
More filters

Journal ArticleDOI
Abstract: This paper considers the model problem of reconstructing an object from incomplete frequency samples. Consider a discrete-time signal f/spl isin/C/sup N/ and a randomly chosen set of frequencies /spl Omega/. Is it possible to reconstruct f from the partial knowledge of its Fourier coefficients on the set /spl Omega/? A typical result of this paper is as follows. Suppose that f is a superposition of |T| spikes f(t)=/spl sigma//sub /spl tau//spl isin/T/f(/spl tau/)/spl delta/(t-/spl tau/) obeying |T|/spl les/C/sub M//spl middot/(log N)/sup -1/ /spl middot/ |/spl Omega/| for some constant C/sub M/>0. We do not know the locations of the spikes nor their amplitudes. Then with probability at least 1-O(N/sup -M/), f can be reconstructed exactly as the solution to the /spl lscr//sub 1/ minimization problem. In short, exact recovery may be obtained by solving a convex optimization problem. We give numerical values for C/sub M/ which depend on the desired probability of success. Our result may be interpreted as a novel kind of nonlinear sampling theorem. In effect, it says that any signal made out of |T| spikes may be recovered by convex programming from almost every set of frequencies of size O(|T|/spl middot/logN). Moreover, this is nearly optimal in the sense that any method succeeding with probability 1-O(N/sup -M/) would in general require a number of frequency samples at least proportional to |T|/spl middot/logN. The methodology extends to a variety of other situations and higher dimensions. For example, we show how one can reconstruct a piecewise constant (one- or two-dimensional) object from incomplete frequency samples - provided that the number of jumps (discontinuities) obeys the condition above - by minimizing other convex functionals such as the total variation of f.

13,375 citations


Proceedings Article
01 Jan 1989
TL;DR: A class of practical and nearly optimal schemes for adapting the size of a neural network by using second-derivative information to make a tradeoff between network complexity and training set error is derived.
Abstract: We have used information-theoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improved speed of learning and/or classification. The basic idea is to use second-derivative information to make a tradeoff between network complexity and training set error. Experiments confirm the usefulness of the methods on a real-world application.

3,354 citations


"A compressive sensing based compres..." refers methods in this paper

  • ...Several iterative algorithms have been proposed to solve this min imization problem (Greedy Algorithms such as Orthogonal Matching Pursuit (OMP) or Matching Pursuit (MP) and Non-convex local optimizat ion like FOCUSS algorithm[16]....

    [...]

  • ...All of the traditional algorithms, such as Optimal Brain Damage (OBD)[16], Optimal Brain Surgeon (OBS)[17], and Magnitude-based pruning (MAG)[ 18], Skeletonization (SKEL)[6], non-contributing units (NC)[7] and Extended Fourier Amplitude Sensitiv ity Test (EFAST)[13], are available in SNNS (CSS1 is name of algorithm that uses SMV for sparse representation and CSS2 is another technique that uses MMV for sparserepresentatio n)....

    [...]


Journal ArticleDOI
R. Reed1
TL;DR: The approach taken by the methods described here is to train a network that is larger than necessary and then remove the parts that are not needed.
Abstract: A rule of thumb for obtaining good generalization in systems trained by examples is that one should use the smallest system that will fit the data. Unfortunately, it usually is not obvious what size is best; a system that is too small will not be able to learn the data while one that is just big enough may learn very slowly and be very sensitive to initial conditions and learning parameters. This paper is a survey of neural network pruning algorithms. The approach taken by the methods described here is to train a network that is larger than necessary and then remove the parts that are not needed. >

1,592 citations


"A compressive sensing based compres..." refers background in this paper

  • ...In the sound source localization techniques, location of the source has to be estimated automatically by calculating the direction of the received signal [1]....

    [...]


Proceedings Article
30 Nov 1992
TL;DR: Of OBS, Optimal Brain Damage, and magnitude-based methods, only OBS deletes the correct weights from a trained XOR network in every case, and thus yields better generalization on test data.
Abstract: We investigate the use of information from all second order derivatives of the error function to perform network pruning (i.e., removing unimportant weights from a trained network) in order to improve generalization, simplify networks, reduce hardware or storage requirements, increase the speed of further training, and in some cases enable rule extraction. Our method, Optimal Brain Surgeon (OBS), is Significantly better than magnitude-based methods and Optimal Brain Damage [Le Cun, Denker and Solla, 1990], which often remove the wrong weights. OBS permits the pruning of more weights than other methods (for the same error on the training set), and thus yields better generalization on test data. Crucial to OBS is a recursion relation for calculating the inverse Hessian matrix H-1 from training data and structural information of the net. OBS permits a 90%, a 76%, and a 62% reduction in weights over backpropagation with weight decay on three benchmark MONK's problems [Thrun et al., 1991]. Of OBS, Optimal Brain Damage, and magnitude-based methods, only OBS deletes the correct weights from a trained XOR network in every case. Finally, whereas Sejnowski and Rosenberg [1987] used 18,000 weights in their NETtalk network, we used OBS to prune a network to just 1560 weights, yielding better generalization.

1,565 citations


"A compressive sensing based compres..." refers background in this paper

  • ...Neural network based techniques have been proposed to overcome the computational complexity problem by exploiting their massive parallelism [3,4]....

    [...]


Journal ArticleDOI
TL;DR: A practical iterative algorithm for signal reconstruction is proposed, and potential applications to coding, analog-digital (A/D) conversion, and remote wireless sensing are discussed.
Abstract: Recent results show that a relatively small number of random projections of a signal can contain most of its salient information. It follows that if a signal is compressible in some orthonormal basis, then a very accurate reconstruction can be obtained from random projections. This "compressive sampling" approach is extended here to show that signals can be accurately recovered from random projections contaminated with noise. A practical iterative algorithm for signal reconstruction is proposed, and potential applications to coding, analog-digital (A/D) conversion, and remote wireless sensing are discussed

647 citations


Frequently Asked Questions (1)
Q1. What have the authors contributed in "A compressive sensing based compressed neural network for sound source localization" ?

In this paper, a family of new algorithms for compression of NNs is presented based on Compressive Sampling ( CS ) theory.