scispace - formally typeset
Open AccessJournal ArticleDOI

Direction of Arrival Estimation in the Spherical Harmonic Domain Using Subspace Pseudointensity Vectors

Reads0
Chats0
TLDR
The accuracy of DOA estimation using PIVs and SSPIVs is compared against the state of the art in simulations including realistic reverberation and noise for single and multiple, stationary and moving sources.
Abstract
Direction of arrival DOA estimation is a fundamental problem in acoustic signal processing. It is used in a diverse range of applications, including spatial filtering, speech dereverberation, source separation and diarization. Intensity vector-based DOA estimation is attractive, especially for spherical sensor arrays, because it is computationally efficient. Two such methods are presented that operate on a spherical harmonic decomposition of a sound field observed using a spherical microphone array. The first uses pseudointensity vectors PIVs and works well in acoustic environments where only one sound source is active at any time. The second uses subspace pseudointensity vectors SSPIVs and is targeted at environments where multiple simultaneous soures and significant levels of reverberation make the problem more challenging. Analytical models are used to quantify the effects of an interfering source, diffuse noise, and sensor noise on PIVs and SSPIVs. The accuracy of DOA estimation using PIVs and SSPIVs is compared against the state of the art in simulations including realistic reverberation and noise for single and multiple, stationary and moving sources. Finally, robust performance of the proposed methods is demonstrated by using speech recordings in a real acoustic environment.

read more

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1
Direction of Arrival Estimation in the Spherical
Harmonic Domain using Subspace Pseudo-Intensity
Vectors
Alastair H. Moore, Member, IEEE, Christine Evers, Senior Member, IEEE, and Patrick A. Naylor, Senior
Member, IEEE
Abstract—Direction of Arrival (DOA) estimation is a funda-
mental problem in acoustic signal processing. It is used in a
diverse range of applications, including spatial filtering, speech
dereverberation, source separation and diarization. Intensity
vector-based DOA estimation is attractive, especially for spherical
sensor arrays, because it is computationally efficient. Two such
methods are presented which operate on a spherical harmonic
decomposition of a sound field observed using a spherical micro-
phone array. The first uses Pseudo-Intensity Vectors (PIVs) and
works well in acoustic environments where only one sound source
is active at any time. The second uses Subspace Pseudo-Intensity
Vectors (SSPIVs) and is targeted at environments where multiple
simultaneous sources and significant levels of reverberation make
the problem more challenging. Analytical models are used to
quantify the effects of an interfering source, diffuse noise and
sensor noise on PIVs and SSPIVs. The accuracy of DOA estimation
using PIVs and SSPIVs is compared against the state-of-the-art in
simulations including realistic reverberation and noise for single
and multiple, stationary and moving sources. Finally, robust
performance of the proposed methods is demonstrated using
speech recordings in real acoustic environments.
Index Terms—Direction of arrival estimation, DOA local-
ization, speaker tracking, robot audition, microphone array
processing, spherical microphone array, spherical harmonics
I. INTRODUCTION
M
ANY applications of acoustic signal processing rely
on Direction of Arrival (DOA) estimation, including
spatial filtering, speech dereverberation, source separation and
diarization. Estimation of the DOA of a sound source is particu-
larly important in the context of robot audition where tracking
the directions of one or more moving sources enables an
‘awareness’ of the local environment, which is a requirement
for effective human-robot interaction.
To estimate both the vertical and horizontal angles of
arrival requires a three-dimensional microphone array. Array
geometries which sample the sound field such that it can
be represented in the Spherical Harmonic (SH) domain are
A. H. Moore, C. Evers and P. A. Naylor are with the Department of
Electrical and Electronic Engineering, Imperial College London, London SW7
2AZ, U.K. (e-mail: alastair.h.moore@imperial.ac.uk; c.evers@imperial.ac.uk;
p.naylor@imperial.ac.uk).
The research leading to these results has received funding from the
European Union’s Seventh Framework Programme (FP7/2007-2013) under
grant agreement no. 609465.
This work was supported by the Engineering and Physical Sciences
Research Council [grant number EP/M026698/1].
attractive because this representation allows the sound field to
be analyzed with equal resolution in all directions using algo-
rithms which are independent of the specific array geometry
[1]–[4].
A wide variety of DOA estimation algorithms have been
proposed for use in the SH domain [5]–[14]. Most of these
compute a metric over a dense azimuth-inclination grid before
identifying its peak(s) as the DOA(s). Such methods include
those that compute the Steered Response Power (SRP) due to
a beamformer which is steered towards all potential source
directions and those that compute the spatial spectrum using
subspace methods based on Multiple Signal Classification
(MUSIC) [15].
Many current DOA estimation methods make use of the
spatial covariance matrix [7], [9], [10], [12]. For example,
the SRP map produced by a Minimum Variance Distortionless
Response (MVDR) beamformer optimally rejects background
noise for each look direction by adjusting its beam pattern
according to the spatial covariance matrix and MUSIC [15]
directly decomposes the spatial covariance matrix into signal
and noise subspaces. However, in reverberation, coherent
reflections distort the spatial covariance matrix. For the MVDR
beamformer this is manifested as incorrectly placed attenua-
tion in the beam pattern. For MUSIC the fact that the reflections
are linearly dependent on the direct path signals means the
rank of the covariance matrix is reduced and division between
signal and noise subspaces can be prone to errors.
Frequency Smoothing (FS) [16] has been shown to improve
the accuracy of DOA estimation using MUSIC [7] and MVDR-
SRP [10]. The procedure decorrelates coherent reflections by
combining information across multiple frequency bands. In
the spatial domain, where microphone signals are processed
directly, special focussing matrices and an initial DOA estimate
are required. In the SH domain, FS can be applied as a
straightforward average by assuming frequency independence
of the (mode strength compensated) array manifold [7] [10].
To estimate multiple source DOAs, a number of authors have
proposed methods which exploit the sparsity of speech in the
Time-Freqeuncy (TF) domain. By identifying TF-regions where
a single source is dominant, single source DOA estimation
methods can be employed locally to those regions [12], [17],
[18]. This class of methods exploits the principle that, for
a single dominant source, the rank of the spatial covariance
matrix is unity. In [18] pairwise correlations between adjacent

2 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING
microphones of a circular array were estimated by averaging
over frequency bins within a single time frame. In [17]
the spatial covariance matrix between all microphones was
estimated at each frequency bin by averaging over time frames.
In [12] it was shown that estimating the spatial covariance
matrix by averaging (smoothing) over time and frequency
decorrelates the reflections and so the rank is only unity when
a single direct path is dominant. Accuracy of the subsequent
DOA estimation is substantially improved but the Direct-Path
Dominance (DPD) test described in [12] is reported to be
passed in only 3% of TF-regions. This may lead to time frames
in which there are no DOA estimates, which is problematic in
applications where the sources are moving.
Methods for DOA estimation in the SH domain which exploit
the directional sparsity of sound sources have been proposed
in a series of related works [19]–[22]. In [19] Independent
Component Analysis (ICA) of the SH domain signals was
performed and the DOAs estimated by comparing the columns
of the unmixing matrix to the steering vectors for plane waves
from all possible directions. In [20] the directional component
of the SH domain signals was obtained by subtracting an
estimate of the diffuse component, which were determined
using a subspace approach. An iterative optimization was then
performed to find a sparse set of weights for a dense dictionary
of plane wave elements. The directions associated with the
selected elements represent the estimated DOAs. In [21] and
[22] various approaches to combining the methods of [19] and
[20] were proposed, each with their own success in a particular
application scenario. However, none of these included live
recordings of real-world audio, where small source movements
may be important.
Intensity-based DOA estimation [8], [11], [13], [14] differs
from the previously discussed methods because, by directly
computing the direction of energy flow, there is no need to
compute a spatial cost function. This has the potential for
significant computational savings. The component of intensity
in a particular direction has been measured using two types of
intensity probe [23]. One approximates particle velocity using
the difference between two closely spaced omnidirectional
pressure sensors while the other measures particle velocity
directly [24]. The former approach is more common but
is sensitive to phase mismatch and sensor noise. Using an
array of intensity probes yields an intensity vector in 2 or 3
dimensions, from which the DOA can be found [25], [26].
In [8] DOA estimation using a spherical microphone array
was proposed whereby a large number of microphones was
used to transform the sound field into the SH domain from
which the particle-velocity was approximated. The resulting
vectors were termed Pseudo-Intensity Vectors (PIVs). Those
initial results demonstrated the effectiveness of the method
for single source DOA estimation in a noise-free environment.
In [11] DOA estimation of multiple sources was achieved using
k-means clustering of PIVs. In both [27] and [28] a DOA was
obtained for each TF-bin by first finding the PIV and then
refining the direction by evaluating a cost function using higher
order SHs over a spatially constrained grid around the PIV
direction. The final estimates of multiple sources’ DOAs were
then obtained by identifying the peaks of a histogram of all
the individual direction estimates.
In this current paper we review the formulation and use
of PIVs presented in [8]. We then provide a novel formula-
tion of the PIV which follows directly from the SH domain
representation of a sound field expressed in Cartesian form
and develop an extended analysis of PIVs under non-ideal
conditions. Further, we propose the Subspace Pseudo-Intensity
Vector (SSPIV) which we show to be more robust to noise and
reverberation than the PIV. Like DPD-MUSIC, it exploits FS and
subspace decomposition and assumes TF-sparsity of the input
signal. However, by directly computing a DOA for each TF-
region, rather than evaluating the spatial spectrum over all
possible directions, it is computationally more efficient. We
investigate the criteria under which smoothed histograms of
PIVs and SSPIVs give accurate estimates of the DOAs of multiple
sources in a noisy reverberant environment, including when
sources are moving. Some of the first steps of an earlier version
of the SSPIV method were presented in [13] and [29]. The
current paper extends both the theoretical analysis and the
evaluation of the PIV method compared to [8], especially in
the context of multiple and moving speakers and in real-world
applications.
The remainder of this paper is organized as follows. Sec. II
reviews the SH domain representation of a sound field. Sec. III
presents the PIV and SSPIV methods. Sec. IV analyses PIV and
SSPIV under non-ideal conditions, whether these be caused by
an interfering (independent or correlated) sound source, diffuse
noise or sensor noise. Sec. V presents simulated experiments
comparing the intensity-based methods to classical and state-
of-the-art DOA estimation methods. Sec. VI demonstrates the
effectiveness of the methods in real-world tests. Finally the
paper is concluded in Sec. VII.
II. REVIEW OF SH REPRESENTATION OF A SOUND FIELD
The SH representation of a sound field [4], [30] around
a particular point in space is determined by the complex-
valued plane-wave density a(k, θ, φ), which is a func-
tion of wavenumber k, inclination θ and azimuth φ. A
unit vector pointing towards the n-th plane wave, x
n
=
x
n
y
n
z
n
T
, where (·)
T
is the transpose operator, has
DOA, Ψ
n
= (θ
n
, φ
n
), given by
θ
n
= arccos(z
n
), φ
n
= arctan2(y
n
/x
n
) (1)
where arctan2 is the arctangent function mapped to the correct
quadrant according to the signs of x
n
and y
n
. A plane-wave
density composed of N plane waves is given by
a(k, θ, φ) =
N
X
n=1
δ (cos θ cos θ
n
) δ (φ φ
n
) s
n
(k) (2)
where s
n
(k) is the amplitude of the n-th plane wave and
δ (cos θ) δ (φ) is the Dirac delta function on the sphere, which
is zero everywhere on the sphere except (θ, φ) = (π/2, 0). The
complex SHs of order l and degree m {−l, . . . , l} provide a
set of orthogonal basis functions defined over the unit sphere
[30]
Y
m
l
(θ, φ) =
s
2l + 1
4π
(l m)!
(l + m)!
P
m
l
(cos θ) e
imφ
(3)

A. H. MOORE et al.: DIRECTION OF ARRIVAL ESTIMATION IN THE SH DOMAIN 3
where P
m
l
(·) is the associated Legendre function such that
a(k, θ, φ) =
X
l=0
l
X
m=l
a
lm
(k)Y
m
l
(θ, φ). (4)
Substituting = (θ, φ), the weights of each SH are the
Spherical Fourier Transform (SFT) of a(k, θ, φ)
a
lm
(k) =
Z
S
2
a(k, Ω) [Y
m
l
(Ω)]
d (5)
where
R
S
2
d =
R
2π
0
R
π
0
sin θdθdφ is the integral over the
unit sphere and (·)
denotes conjugation. Substituting (2) into
(5) gives
a
lm
(k) =
N
X
n=1
[Y
m
l
n
)]
s
n
(k) . (6)
Considering the (L + 1)
2
SHs up to l L, (6) is expressed in
stacked vector notation as [12]
a
lm
(k) = Y(Ψ)
H
s(k) (7)
where subscript lm on a vector denotes that the elements
are SH coefficients, s(k) = [s
1
(k) . . . s
N
(k)]
T
, Ψ =
1
. . . Ψ
N
]
T
,
Y(Ψ) =
y
1
)
.
.
.
y
N
)
, (8)
y
n
) =
Y
0
0
n
) Y
1
1
n
) Y
0
1
n
) Y
1
1
n
) . . . Y
L
L
n
)
and (·)
H
denotes the conjugate transpose.
The SH domain representation of the plane-wave density, as
expressed in (7), is useful because the steering vectors, y
n
),
are analytic functions which are independent of frequency.
In order to obtain this representation, the sound field in the
vicinity of the point of interest must be observed. The pressure
at a particular point is related to the plane-wave density by
the mode strength, which depends on the distance of the point
from the origin and whether a rigid scatterer is present [2], [3],
[30]. Although irregular sampling schemes are possible, for
mathematical convenience we use the pressure on the surface
of a sphere of radius r centered at the origin, p(k, r, Ω), for
which the mode strength can be denoted b
l
(kr). The SFT of
this function is
p
lm
(k, r) = B(kr)a
lm
(k) (9)
where B(kr) = diag {b
0
b
1
b
1
b
1
. . . b
L
}, p
lm
(k, r) =
p
00
p
1(1)
p
10
p
11
. . . p
LL
T
is a vector of SH coefficients
and the functional dependence of the stacked terms has been
omitted for clarity. Sampling p(k, r, Ω) at Q points with
directions {
q
}
Q
1
, the SFT is approximated using the discrete
SFT [4]
p
lm
(k, r)
=
Y ()
H
Wp(k, r) (10)
where p(k, r) = [p
1
. . . p
Q
]
T
is the pressure at each of the
sample points, W = diag {w
1
w
2
. . . w
Q
}, where {w
q
}
Q
1
are the weights of the sampling scheme, and Y () is a
Q×(L+1)
2
matrix defined as in (8) but with the SHs evaluated
at {
q
}
Q
1
. For the approximation in (10) to hold up to the
maximum spherical harmonic order, L, requires that there are
sufficient microphones, Q (L + 1)
2
, and that they are
adequately distributed over the sphere [31]. Furthermore, for a
given radius, the error in the approximation of (10) increases
with frequency. In practice the upper threshold is commonly
taken as kr < L [7], [12], although to avoid spatial aliasing
requires kr L [2], [31]. It has also been shown that to
accurately reproduce the pressure at a point due to a plane
wave using the inverse SFT requires a much more conservative
threshold [32].
Equating (9) and (10) the plane-wave density can be ob-
tained as
a
lm
(k) = B(kr)
1
Y ()
H
Wp(kr). (11)
Let x(k, r) = p(k, r) + v(k) be the observation of p(k, r)
in the presence of sensor noise which we assume to be zero-
mean, normally distributed, uncorrelated between sensors and
uncorrelated with s(k). Applying the SFT and compensating
for the mode strength, the observed plane-wave density is
˜
x
lm
(k) = Y(Ψ)
H
s(k) +
˜
v
lm
(k) (12)
where
˜
v
lm
(k) = B(kr)
1
Y ()
H
Wv(k). (13)
III. PSEUDO-INTENSITY VECTOR FORMULATION
The pseudo-intensity vector was proposed in [8] as an
approximation to the active intensity vector. This approach
is reviewed in Sec. III-A while in Sec. III-B an equivalent
vector is derived directly from the SH representation of the
sound field. Finally, in Sec. III-C, the SSPIV is formulated.
A. Review of sound intensity and pseudo-intensity
The active intensity vector is defined as the time-averaged
magnitude and direction of the net flow of energy and is given
by [30]
I (k) =
1
2
R
p (k)
u (k)
(14)
where p(k) is the omnidirectional pressure, u (k) =
[u
x
(k) u
y
(k) u
z
(k)]
T
is a vector of the particle velocities
in the Cartesian directions and R{·} is the real operator. It is
useful for DOA estimation because acoustic energy flows in the
direction of wave propagation. For a planewave, the particle
velocity vector is related to direction of arrival (θ, φ) as [8]
u (k) =
p (k)
ρ
0
c
sin θ cos φ
sin θ sin φ
cos θ
(15)
where ρ
0
and c are the ambient density and speed of sound
in the medium, respectively. It can be seen that the elements
of u (k) have dipole directivity patterns aligned with the
Cartesian axes and that the resulting vector points in the
opposite direction from the DOA.
A beamformer with a dipole directivity pattern can be
obtained directly from first order SH coefficients as
D(k, ϕ, a
lm
(k)) =
1
X
m=1
Y
m
1
(ϕ)a
1(m)
(k) (16)

4 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING
where ϕ is the steering direction. Therefore to approximate
(14) using the SH coefficients of the plane-wave density
function the PIV is formulated [8]
I (k) =
1
2
R
a
00
(k)
D(k, ϕ
x
, a
lm
(k))
D(k, ϕ
y
, a
lm
(k))
D(k, ϕ
z
, a
lm
(k))
(17)
where ϕ
x
= (π/2, π), ϕ
y
= (π/2, π/2) and ϕ
z
=
(π, 0).
B. Alternative formulation of PIV
From (6) the plane-wave decomposition for the n-th plane
wave is a
(n)
lm
(k) = [Y
m
l
n
)]
s
n
(k). Expressing the first
order coefficients in Cartesian form gives
a
(n)
1(1)
(k) = s
n
(k)
p
3/8π (x
n
+ iy
n
) (18a)
a
(n)
10
(k) = s
n
(k)
p
3/4πz
n
(18b)
a
(n)
11
(k) = s
n
(k)
p
3/8π (x
n
+ iy
n
) . (18c)
where the SHs are evaluated on the unit sphere. Rearranging
(18) gives
s
n
(k)x
n
=
r
8π
3
1
2
a
(n)
1(1)
(k) a
(n)
11
(k)
(19a)
s
n
(k)y
n
=
r
8π
3
1
2i
a
(n)
1(1)
(k) + a
(n)
11
(k)
(19b)
s
n
(k)z
n
=
r
8π
3
1
2
a
(n)
10
(k). (19c)
which can be interpreted as a weighted sum of the 1-order
plane-wave decomposition coefficients. Moreover, the weight
corresponding to each a
(n)
1(m)
(k) is proportional to the order 1,
degree m SH evaluated in the required axial direction as
s
n
(k)$
n
=
4π
3
1
X
m=1
Y
m
1
(ϕ
$
)a
(n)
1(m)
(k) (20)
=
4π
3
D(k, ϕ
$
, a
(n)
lm
(k)) (21)
where (16) has been used to obtain (21), $ {x, y, z},
ϕ
x
= (π/2, 0), ϕ
y
= (π/2, π/2) and ϕ
z
= (0, 0). To
obtain a vector pointing towards the n-th DOA, we note that
a
(n)
00
(k) =
q
1
4π
s
n
(k) and evaluate (23) for $ {x, y, z}
leading to
˜
I(k) =
4π
4π
3
R
a
(n)
00
(k)
D(k, ϕ
x
, a
(n)
lm
(k))
D(k, ϕ
y
, a
(n)
lm
(k))
D(k, ϕ
z
, a
(n)
lm
(k))
(22)
= R
n
|s
n
(k)|
2
x
n
o
(23)
where, for a single plane wave in noise free conditions, the
argument to the real operator is intrinsically real but the
real operator may be needed in practical implementations
with finite precision. The direction of the PIV in spherical
coordinates can be extracted using (1) from the unit vector
given by
˜
I(k)/
˜
I(k)
where k·k denotes the `
2
-norm.
The formulation of (22) is structurally identical to (17), but
I (k) and
˜
I(k) point in opposite directions due to the steering
of the dipoles. Moreover, the inclusion of the 4π
4π/3
normalizing constant in (22) leads to the simplified form of
(23) which will make the notation of the subsequent analysis
more straightforward. For historical reasons
˜
I(k) is hereafter
referred to as the PIV but its orientation towards the DOA is
preferred for simplicity of describing the methods.
C. Subspace PIV
The SSPIV extends the concept of PIVs to take advantage
of higher order SHs and frequency smoothing and is aimed
at providing more accurate and reliable DOA estimates in
the presence of multiple and interfering sound sources and
reverberation. It follows from (7) that the covariance of a
lm
is [12]
R
a
lm
= E
a
lm
a
H
lm
(24)
= Y
H
(Ψ)R
s
Y(Ψ) (25)
where R
s
= E
ss
H
. Singular Value Decomposition (SVD)
leads to
R
a
lm
= UΣU
H
= [U
s
U
n
]
Σ
s
0
0 Σ
n
U
H
s
U
H
n
(26)
where U is a unitary matrix, Σ is a diagonal matrix containing
the singular values of R
a
lm
and U
s
and U
n
respectively,
represent the conventional partitioning into signal and noise
subspaces [15]. In the simplest case of a single plane wave,
U
s
=
ˆa
00
ˆa
1(1)
ˆa
10
ˆa
11
. . . ˆa
LL
T
is a column vector and
is proportional to the steering vector for the plane wave DOA,
y
n
). The SSPIV method applies the PIV method (c.f. (22)
and (16)) to the one-dimensional signal subspace as
˜
I
ss
=
4π
4π
3
R
ˆa
00
D(k, ϕ
x
, U
s
)
D(k, ϕ
y
, U
s
)
D(k, ϕ
z
, U
s
)
. (27)
to obtain a vector pointing towards the source. Whilst (27)
depends only on the 0 and 1st order components of U
s
,
through (25) and (26), their values do depend on the higher
order SH terms of a
lm
. As with the PIV method, the benefit of
this approach is that a direction is obtained for each TF-region
directly, i.e. without evaluating all possible directions. The
implications of violating the assumption that a single plane
wave is present is addressed in Sec. IV.
IV. PIV AND SSPIV DISTRIBUTIONS FOR REPRESENTATIVE
EXAMPLE SOUND FIELDS
As described in Sec. II, an arbitrary sound field can be
decomposed into a sum of plane waves. In this section we
consider how the PIVs and SSPIVs are affected by amplitude,
phase and directional relationships between two plane waves.
These simplified cases provide some insight into the behavior
of pseudo-intensity vectors in real acoustic environments.

A. H. MOORE et al.: DIRECTION OF ARRIVAL ESTIMATION IN THE SH DOMAIN 5
x
0 0.4 0.8 1.2 1.6 2
y
-0.6
-0.2
0.2
0.6
1
0
/
45
/
,
2
1
x
1
90
/
135
/
180
/
,
2
2
x
2
Fig. 1.
˜
I for selected values of |β
1
β
2
| with fixed α
1
, α
2
, x
1
, and x
2
.
α
2
1
x
1
and α
2
2
x
2
are shown for reference.
A. Two plane waves - general case
For two plane waves with DOAs given by the unit vectors,
x
n
, n = {1, 2}, and source signals s
n
(k) = α
n
(k)e
n
(k)
,
where α
n
(k) and β
n
(k) are the magnitude and phase at the
origin, respectively, the PIV is obtained from (2) and (22) as
˜
I = R{(s
1
+ s
2
)
(s
1
x
1
+ s
2
x
2
)}. (28)
= α
2
1
x
1
+ α
2
2
x
2
+ (x
1
+ x
2
) α
1
α
2
cos (β
1
β
2
) (29)
where for brevity the dependence on k is assumed. This is
interesting because it implies that the resulting vector lies on
the plane containing the vectors x
1
and x
2
but that it does not
necessarily lie between the two. To illustrate this point, Fig. 1
shows
˜
I for various values of |β
1
β
2
|. The resulting vector
is nominally distributed about the direction of the stronger
source (i.e. x
1
) but is either drawn towards the direction of
the weaker source (i.e. x
2
) or repelled from it, depending on
the relative amplitudes and phases of the signals.
The SSPIV depends on the SVD of R
a
lm
, which is determined
primarily by the source covariance, R
s
=
σ
2
1
σ
21
σ
12
σ
2
2
,
where σ
2
1
and σ
2
2
are the variances of the two plane waves
and σ
12
, σ
21
is their covariance. The dimensionality of R
a
lm
depends on the maximum SH order, L, but is independent of
the number of plane waves.
B. Uncorrelated sources
Consider two uncorrelated sources in a free-field with fixed
DOAs and amplitude ratio. We assume that β
1
and β
2
are
independent with identical uniform distribution U(0, 2π) such
that β = β
1
β
2
is a triangular distribution over the interval
β [2π, 2π] which, due to periodicity of the phase,
reduces to β [π, π] with probability p (∆β) = 1/(2π).
The expected value of
˜
I is obtained by integrating (29) with
respect to β,
0 30 60 90 120 150 180
Interferer angle [deg]
0
0.1
0.2
0.3
0.4
0.5
0.6
Error angle [deg]
Ef
~
Ig
~
I
ss
L = 1
~
I
ss
L = 3
~
I
ss
L = 7
(a)
0 30 60 90 120 150 180
Interferer angle [deg]
0
5
10
15
20
25
30
35
Error angle [deg]
Ef
~
Ig
~
I
ss
L = 1
~
I
ss
L = 3
~
I
ss
L = 7
(b)
Fig. 2. Error in E
n
˜
I
o
and
˜
I
ss
for for L = {1, 3, 7} as function of (x
1
, x
2
)
with (a) SIR 20 dB and (b) SIR 3 dB.
E
n
˜
I
o
=
Z
π
π
˜
Ip (∆β) dβ (30)
= α
2
1
x
1
+ α
2
2
x
2
+
(x
1
+ x
2
) α
1
α
2
2π
Z
π
π
cos (β
1
β
2
) dβ (31)
= α
2
1
x
1
+ α
2
2
x
2
. (32)
The SSPIV is determined by the source covariance, R
s
α
2
1
0
0 α
2
2
, the DOAs and the maximum order of SHs con-
sidered. Without loss of generality, let x
1
point in the direction
of the desired source such that the Signal-to-Interference Ratio
(SIR) in dB is 10 log
10
(α
2
1
2
2
) 0. Figure 2 shows the error
angle (x
1
,
˜
I
ss
) as a function of the interferer angle (x
1
, x
2
)
for SIRs of 20 dB and 3 dB and for different values of L. These
plots were produced by collating, without averaging, SSPIVs
calculated according to (25), (26) and (27) for interferers
at 794 approximately equally distributed directions and 5
random target directions. The variation in error as a function of
interferer angle has multiple peaks and nulls corresponding to
the number of lobes in the real (or imaginary) part of highest
order SH considered but is independent of the target direction.
Increasing L reduces the worst case error, which confirms that
higher order SHs are being utilized by the SSPIV. Also shown is
(x
1
, E
n
˜
I
o
) where E
n
˜
I
o
is calculated according to (32).
Figure 2(a) shows that PIVs and SSPIVs are both accurate to

Citations
More filters
Journal ArticleDOI

The LOCATA Challenge: Acoustic Source Localization and Tracking

TL;DR: The LOCAlization and Tracking Challenge (LOCATA) as discussed by the authors is an open-access framework for the objective evaluation and benchmarking of broad classes of algorithms for sound source localization and tracking.
Proceedings ArticleDOI

The LOCATA Challenge Data Corpus for Acoustic Source Localization and Tracking

TL;DR: The IEEE-AASP Challenge on sound source localization and tracking (LOCATA) provides a novel, comprehensive data corpus for the objective benchmarking of state-of-the-art algorithms on sound sources localization andtracking.
Journal ArticleDOI

Acoustic SLAM

TL;DR: This paper proposes Acoustic Simultaneous Localization and Mapping (aSLAM), which uses acoustic signals to simultaneously map the 3D positions of multiple sound sources while passively localizing the observer within the scene map.
Journal ArticleDOI

Sound Localization Based on Phase Difference Enhancement Using Deep Neural Networks

TL;DR: A DNN-based phase difference enhancement for DoA estimation, which turned out to be better than the direct estimation of the DoAs from the input interchannel phase differences (IPDs).
Journal ArticleDOI

Novel application of FO-DPSO for 2-D parameter estimation of electromagnetic plane waves

TL;DR: Worth of the proposed FO-DPSO based optimization mechanism is established by consistently achieving the near to optimal values of performance metrics in all three scenarios for monostatic MIMO radar systems.
References
More filters
Journal ArticleDOI

Multiple emitter location and signal parameter estimation

TL;DR: In this article, a description of the multiple signal classification (MUSIC) algorithm, which provides asymptotically unbiased estimates of 1) number of incident wavefronts present; 2) directions of arrival (DOA) (or emitter locations); 3) strengths and cross correlations among the incident waveforms; 4) noise/interference strength.
Journal ArticleDOI

Image method for efficiently simulating small‐room acoustics

TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.
Dataset

TIMIT Acoustic-Phonetic Continuous Speech Corpus

TL;DR: The TIMIT corpus as mentioned in this paper contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences, including time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance.
Journal ArticleDOI

Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources

TL;DR: In this paper, a method of constructing a single signal subspace for high-resolution estimation of the angles of arrival of multiple wide-band plane waves is presented, which relies on an approximately coherent combination of the spatial signal spaces of the temporally narrow-band decomposition of the received signal vector from an array of sensors.
Related Papers (5)
Frequently Asked Questions (18)
Q1. What are the contributions mentioned in the paper "Direction of arrival estimation in the spherical harmonic domain using subspace pseudo-intensity vectors" ?

In this paper, the authors compared the performance of two intensity vector-based DOA estimation methods, namely Pseudo-Intensity Vectors ( PIV ) and Subspace Pseudo Intensity Vector ( SSPIV ), in the SH domain. 

The SH domain representation of the plane-wave density, as expressed in (7), is useful because the steering vectors, y(Ψn), are analytic functions which are independent of frequency. 

The SH representation of a sound field [4], [30] around a particular point in space is determined by the complexvalued plane-wave density a(k, θ, φ), which is a function of wavenumber k, inclination θ and azimuth φ. 

The parameters specific to the proposed method were: σ = 4◦; NKθ=91 and NKφ=180, which corresponds to 2◦resolution in azimuth and inclination; and λ = 0.001/(σ √2π), which removes entries >15◦ from the look direction. 

Since SSPIV also requires significantly less computation than DPD-MUSIC at dense grid resolutions, it is particularly well suited to DOA estimation in situations involving multiple, moving speakers. 

For all methods (PIV, SSPIV, PWD-SRP and DPD-MUSIC) the corresponding spatial cost function were computed over a 2D grid with 2◦ resolution in azimuth and inclination. 

For moving sources the optimal length of observation interval is a trade-off between robustness to noise and the ability to follow the true source direction. 

It is assumed that the effective rank of R̂x̃lm(ν, `) in those TF-regions whichpass the DPD test is unity and so the noise subspace has dimension (L+ 1)2 − 1. 

These were arranged at approximately 60◦ intervals and their inclinations alternated to be above or below the horizontal plane of the array, according to whether they were seated or standing. 

The PWD beamformer maximizes the directivity index and is equivalent to the MVDR under the assumption of an uncorrelated diffuse noise field. 

To demonstrate the efficacy of the proposed methods, speech was recorded in a real room with dimensions of approximately 10.3×9.2×2.6 m and a reverberation time of 0.4 s. Speech signals were recorded using an Eigenmike 32 channel rigid spherical microphone array with radius 4.2 cm located close to the centre of the room. 

The error is highly dependent on all the factors but for any interferer angle the error is zero when cos |γ| = −g and increases as |γ| → 0◦ and |γ| → 180◦. 

The effect of estimation errors in the spatial covariance matrices is addressed through numerical simulations and real experiments in Sec. V and VI, respectively. 

This is especially apparent for miss rates between 0.25 and 0.5 where DPD-MUSIC averages 0.7-2.3 clutter measurements per time step whereas SSPIVaverages less than 0.3. 

These took {0.0073, 0.0122, 0.0726, 0.2954} s and{0.0040, 0.0181, 0.3051, 2.9103} s, respectively, to compute for grid resolutions {10◦, 5◦, 2◦, 1◦}. 

A sparse dictionary is enforced by setting entries smaller than λ to zero, i.e.K̂jθ,jφ (ϕ) ={ 0 Kjθ,jφ (ϕ) < λKjθ,jφ (ϕ) otherwise . 

using Nd = 1 (and 4) a single (set of) estimated DOA(s) was obtained for each trial by setting the observation interval to the full length of the signal (4 seconds). 

So as to be relevant to practical scenarios with moving sound sources, in the second scenario, two sources were recorded whilst moving around a radius of 1.5 m.