What are the parameters specific to the proposed method?

The parameters specific to the proposed method were: σ = 4◦; NKθ=91 and NKφ=180, which corresponds to 2◦resolution in azimuth and inclination; and λ = 0.001/(σ √2π), which removes entries >15◦ from the look direction.

What is the computational cost of calculating SSPIV?

Since SSPIV also requires significantly less computation than DPD-MUSIC at dense grid resolutions, it is particularly well suited to DOA estimation in situations involving multiple, moving speakers.

What is the corresponding spatial cost function for the DOA estimation method?

For all methods (PIV, SSPIV, PWD-SRP and DPD-MUSIC) the corresponding spatial cost function were computed over a 2D grid with 2◦ resolution in azimuth and inclination.

What is the way to estimate the DOA of moving sources?

For moving sources the optimal length of observation interval is a trade-off between robustness to noise and the ability to follow the true source direction.

What is the effect of the DPD test on the noise subspace?

It is assumed that the effective rank of R̂x̃lm(ν, `) in those TF-regions whichpass the DPD test is unity and so the noise subspace has dimension (L+ 1)2 − 1.

How many inclinations did the speakers have to be arranged?

These were arranged at approximately 60◦ intervals and their inclinations alternated to be above or below the horizontal plane of the array, according to whether they were seated or standing.

What is the maximum directivity index of the beamformer?

The PWD beamformer maximizes the directivity index and is equivalent to the MVDR under the assumption of an uncorrelated diffuse noise field.

How many spherical microphones were used to record speech?

To demonstrate the efficacy of the proposed methods, speech was recorded in a real room with dimensions of approximately 10.3×9.2×2.6 m and a reverberation time of 0.4 s. Speech signals were recorded using an Eigenmike 32 channel rigid spherical microphone array with radius 4.2 cm located close to the centre of the room.

What is the error angle of cos?

The error is highly dependent on all the factors but for any interferer angle the error is zero when cos |γ| = −g and increases as |γ| → 0◦ and |γ| → 180◦.

What is the effect of estimation errors in the spatial covariance matrices?

The effect of estimation errors in the spatial covariance matrices is addressed through numerical simulations and real experiments in Sec. V and VI, respectively.

What is the effect of DPD-MUSIC on the miss rate?

This is especially apparent for miss rates between 0.25 and 0.5 where DPD-MUSIC averages 0.7-2.3 clutter measurements per time step whereas SSPIVaverages less than 0.3.

How many s did it take to compute for grid resolutions 10, 5?

These took {0.0073, 0.0122, 0.0726, 0.2954} s and{0.0040, 0.0181, 0.3051, 2.9103} s, respectively, to compute for grid resolutions {10◦, 5◦, 2◦, 1◦}.

What is the definition of a sparse dictionary?

A sparse dictionary is enforced by setting entries smaller than λ to zero, i.e.K̂jθ,jφ (ϕ) ={ 0 Kjθ,jφ (ϕ) < λKjθ,jφ (ϕ) otherwise .

How many DOAs were estimated for each trial?

using Nd = 1 (and 4) a single (set of) estimated DOA(s) was obtained for each trial by setting the observation interval to the full length of the signal (4 seconds).

How many sources were recorded in the second scenario?

So as to be relevant to practical scenarios with moving sound sources, in the second scenario, two sources were recorded whilst moving around a radius of 1.5 m.

(Open Access) Direction of Arrival Estimation in the Spherical Harmonic Domain Using Subspace Pseudointensity Vectors (2017) | Alastair H. Moore

Q: What are the contributions mentioned in the paper "Direction of arrival estimation in the spherical harmonic domain using subspace pseudo-intensity vectors" ?

In this paper, the authors compared the performance of two intensity vector-based DOA estimation methods, namely Pseudo-Intensity Vectors ( PIV ) and Subspace Pseudo Intensity Vector ( SSPIV ), in the SH domain.

Q: What is the sh domain representation of a sound field?

The SH domain representation of the plane-wave density, as expressed in (7), is useful because the steering vectors, y(Ψn), are analytic functions which are independent of frequency.

Q: what is the sh representation of a sound field?

The SH representation of a sound field [4], [30] around a particular point in space is determined by the complexvalued plane-wave density a(k, θ, φ), which is a function of wavenumber k, inclination θ and azimuth φ.

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1

Direction of Arrival Estimation in the Spherical

Harmonic Domain using Subspace Pseudo-Intensity

Vectors

Alastair H. Moore, Member, IEEE, Christine Evers, Senior Member, IEEE, and Patrick A. Naylor, Senior

Member, IEEE

Abstract—Direction of Arrival (DOA) estimation is a funda-

mental problem in acoustic signal processing. It is used in a

diverse range of applications, including spatial ﬁltering, speech

dereverberation, source separation and diarization. Intensity

vector-based DOA estimation is attractive, especially for spherical

sensor arrays, because it is computationally efﬁcient. Two such

methods are presented which operate on a spherical harmonic

decomposition of a sound ﬁeld observed using a spherical micro-

phone array. The ﬁrst uses Pseudo-Intensity Vectors (PIVs) and

works well in acoustic environments where only one sound source

is active at any time. The second uses Subspace Pseudo-Intensity

Vectors (SSPIVs) and is targeted at environments where multiple

simultaneous sources and signiﬁcant levels of reverberation make

the problem more challenging. Analytical models are used to

quantify the effects of an interfering source, diffuse noise and

sensor noise on PIVs and SSPIVs. The accuracy of DOA estimation

using PIVs and SSPIVs is compared against the state-of-the-art in

simulations including realistic reverberation and noise for single

and multiple, stationary and moving sources. Finally, robust

performance of the proposed methods is demonstrated using

speech recordings in real acoustic environments.

Index Terms—Direction of arrival estimation, DOA local-

ization, speaker tracking, robot audition, microphone array

processing, spherical microphone array, spherical harmonics

I. INTRODUCTION

ANY applications of acoustic signal processing rely

on Direction of Arrival (DOA) estimation, including

spatial ﬁltering, speech dereverberation, source separation and

diarization. Estimation of the DOA of a sound source is particu-

larly important in the context of robot audition where tracking

the directions of one or more moving sources enables an

‘awareness’ of the local environment, which is a requirement

for effective human-robot interaction.

To estimate both the vertical and horizontal angles of

arrival requires a three-dimensional microphone array. Array

geometries which sample the sound ﬁeld such that it can

be represented in the Spherical Harmonic (SH) domain are

A. H. Moore, C. Evers and P. A. Naylor are with the Department of

Electrical and Electronic Engineering, Imperial College London, London SW7

2AZ, U.K. (e-mail: alastair.h.moore@imperial.ac.uk; c.evers@imperial.ac.uk;

p.naylor@imperial.ac.uk).

The research leading to these results has received funding from the

European Union’s Seventh Framework Programme (FP7/2007-2013) under

grant agreement no. 609465.

This work was supported by the Engineering and Physical Sciences

Research Council [grant number EP/M026698/1].

attractive because this representation allows the sound ﬁeld to

be analyzed with equal resolution in all directions using algo-

rithms which are independent of the speciﬁc array geometry

[1]–[4].

A wide variety of DOA estimation algorithms have been

proposed for use in the SH domain [5]–[14]. Most of these

compute a metric over a dense azimuth-inclination grid before

identifying its peak(s) as the DOA(s). Such methods include

those that compute the Steered Response Power (SRP) due to

a beamformer which is steered towards all potential source

directions and those that compute the spatial spectrum using

subspace methods based on Multiple Signal Classiﬁcation

(MUSIC) [15].

Many current DOA estimation methods make use of the

spatial covariance matrix [7], [9], [10], [12]. For example,

the SRP map produced by a Minimum Variance Distortionless

Response (MVDR) beamformer optimally rejects background

noise for each look direction by adjusting its beam pattern

according to the spatial covariance matrix and MUSIC [15]

directly decomposes the spatial covariance matrix into signal

and noise subspaces. However, in reverberation, coherent

reﬂections distort the spatial covariance matrix. For the MVDR

beamformer this is manifested as incorrectly placed attenua-

tion in the beam pattern. For MUSIC the fact that the reﬂections

are linearly dependent on the direct path signals means the

rank of the covariance matrix is reduced and division between

signal and noise subspaces can be prone to errors.

Frequency Smoothing (FS) [16] has been shown to improve

the accuracy of DOA estimation using MUSIC [7] and MVDR-

SRP [10]. The procedure decorrelates coherent reﬂections by

combining information across multiple frequency bands. In

the spatial domain, where microphone signals are processed

directly, special focussing matrices and an initial DOA estimate

are required. In the SH domain, FS can be applied as a

straightforward average by assuming frequency independence

of the (mode strength compensated) array manifold [7] [10].

To estimate multiple source DOAs, a number of authors have

proposed methods which exploit the sparsity of speech in the

Time-Freqeuncy (TF) domain. By identifying TF-regions where

a single source is dominant, single source DOA estimation

methods can be employed locally to those regions [12], [17],

[18]. This class of methods exploits the principle that, for

a single dominant source, the rank of the spatial covariance

matrix is unity. In [18] pairwise correlations between adjacent

2 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING

microphones of a circular array were estimated by averaging

over frequency bins within a single time frame. In [17]

the spatial covariance matrix between all microphones was

estimated at each frequency bin by averaging over time frames.

In [12] it was shown that estimating the spatial covariance

matrix by averaging (smoothing) over time and frequency

decorrelates the reﬂections and so the rank is only unity when

a single direct path is dominant. Accuracy of the subsequent

DOA estimation is substantially improved but the Direct-Path

Dominance (DPD) test described in [12] is reported to be

passed in only 3% of TF-regions. This may lead to time frames

in which there are no DOA estimates, which is problematic in

applications where the sources are moving.

Methods for DOA estimation in the SH domain which exploit

the directional sparsity of sound sources have been proposed

in a series of related works [19]–[22]. In [19] Independent

Component Analysis (ICA) of the SH domain signals was

performed and the DOAs estimated by comparing the columns

of the unmixing matrix to the steering vectors for plane waves

from all possible directions. In [20] the directional component

of the SH domain signals was obtained by subtracting an

estimate of the diffuse component, which were determined

using a subspace approach. An iterative optimization was then

performed to ﬁnd a sparse set of weights for a dense dictionary

of plane wave elements. The directions associated with the

selected elements represent the estimated DOAs. In [21] and

[22] various approaches to combining the methods of [19] and

[20] were proposed, each with their own success in a particular

application scenario. However, none of these included live

recordings of real-world audio, where small source movements

may be important.

Intensity-based DOA estimation [8], [11], [13], [14] differs

from the previously discussed methods because, by directly

computing the direction of energy ﬂow, there is no need to

compute a spatial cost function. This has the potential for

signiﬁcant computational savings. The component of intensity

in a particular direction has been measured using two types of

intensity probe [23]. One approximates particle velocity using

the difference between two closely spaced omnidirectional

pressure sensors while the other measures particle velocity

directly [24]. The former approach is more common but

is sensitive to phase mismatch and sensor noise. Using an

array of intensity probes yields an intensity vector in 2 or 3

dimensions, from which the DOA can be found [25], [26].

In [8] DOA estimation using a spherical microphone array

was proposed whereby a large number of microphones was

used to transform the sound ﬁeld into the SH domain from

which the particle-velocity was approximated. The resulting

vectors were termed Pseudo-Intensity Vectors (PIVs). Those

initial results demonstrated the effectiveness of the method

for single source DOA estimation in a noise-free environment.

In [11] DOA estimation of multiple sources was achieved using

k-means clustering of PIVs. In both [27] and [28] a DOA was

obtained for each TF-bin by ﬁrst ﬁnding the PIV and then

reﬁning the direction by evaluating a cost function using higher

order SHs over a spatially constrained grid around the PIV

direction. The ﬁnal estimates of multiple sources’ DOAs were

then obtained by identifying the peaks of a histogram of all

the individual direction estimates.

In this current paper we review the formulation and use

of PIVs presented in [8]. We then provide a novel formula-

tion of the PIV which follows directly from the SH domain

representation of a sound ﬁeld expressed in Cartesian form

and develop an extended analysis of PIVs under non-ideal

conditions. Further, we propose the Subspace Pseudo-Intensity

Vector (SSPIV) which we show to be more robust to noise and

reverberation than the PIV. Like DPD-MUSIC, it exploits FS and

subspace decomposition and assumes TF-sparsity of the input

signal. However, by directly computing a DOA for each TF-

region, rather than evaluating the spatial spectrum over all

possible directions, it is computationally more efﬁcient. We

investigate the criteria under which smoothed histograms of

PIVs and SSPIVs give accurate estimates of the DOAs of multiple

sources in a noisy reverberant environment, including when

sources are moving. Some of the ﬁrst steps of an earlier version

of the SSPIV method were presented in [13] and [29]. The

current paper extends both the theoretical analysis and the

evaluation of the PIV method compared to [8], especially in

the context of multiple and moving speakers and in real-world

applications.

The remainder of this paper is organized as follows. Sec. II

reviews the SH domain representation of a sound ﬁeld. Sec. III

presents the PIV and SSPIV methods. Sec. IV analyses PIV and

SSPIV under non-ideal conditions, whether these be caused by

an interfering (independent or correlated) sound source, diffuse

noise or sensor noise. Sec. V presents simulated experiments

comparing the intensity-based methods to classical and state-

of-the-art DOA estimation methods. Sec. VI demonstrates the

effectiveness of the methods in real-world tests. Finally the

paper is concluded in Sec. VII.

II. REVIEW OF SH REPRESENTATION OF A SOUND FIELD

The SH representation of a sound ﬁeld [4], [30] around

a particular point in space is determined by the complex-

valued plane-wave density a(k, θ, φ), which is a func-

tion of wavenumber k, inclination θ and azimuth φ. A

unit vector pointing towards the n-th plane wave, x





, where (·)

is the transpose operator, has

DOA, Ψ

= (θ

, φ

), given by

= arccos(z

), φ

= arctan2(y

) (1)

where arctan2 is the arctangent function mapped to the correct

quadrant according to the signs of x

and y

. A plane-wave

density composed of N plane waves is given by

a(k, θ, φ) =

n=1

δ (cos θ − cos θ

) δ (φ − φ

) s

(k) (2)

where s

(k) is the amplitude of the n-th plane wave and

δ (cos θ) δ (φ) is the Dirac delta function on the sphere, which

is zero everywhere on the sphere except (θ, φ) = (π/2, 0). The

complex SHs of order l and degree m ∈ {−l, . . . , l} provide a

set of orthogonal basis functions deﬁned over the unit sphere

[30]

(θ, φ) =

2l + 1

4π

(l − m)!

(l + m)!

(cos θ) e

imφ

(3)

A. H. MOORE et al.: DIRECTION OF ARRIVAL ESTIMATION IN THE SH DOMAIN 3

where P

(·) is the associated Legendre function such that

a(k, θ, φ) =

∞

l=0

m=−l

(k)Y

(θ, φ). (4)

Substituting Ω = (θ, φ), the weights of each SH are the

Spherical Fourier Transform (SFT) of a(k, θ, φ)

(k) =

Ω∈S

a(k, Ω) [Y

(Ω)]

∗

dΩ (5)

where

Ω∈S

dΩ =

2π

sin θdθdφ is the integral over the

unit sphere and (·)

∗

denotes conjugation. Substituting (2) into

(5) gives

(k) =

n=1

(Ψ

)]

∗

(k) . (6)

Considering the (L + 1)

SHs up to l ≤ L, (6) is expressed in

stacked vector notation as [12]

(k) = Y(Ψ)

s(k) (7)

where subscript lm on a vector denotes that the elements

are SH coefﬁcients, s(k) = [s

(k) . . . s

(k)]

, Ψ =

[Ψ

. . . Ψ

]

Y(Ψ) =







y(Ψ

)

y(Ψ

)







, (8)

y(Ψ

) =



(Ψ

) Y

−1

(Ψ

) Y

(Ψ

) Y

(Ψ

) . . . Y

(Ψ

)



and (·)

denotes the conjugate transpose.

The SH domain representation of the plane-wave density, as

expressed in (7), is useful because the steering vectors, y(Ψ

are analytic functions which are independent of frequency.

In order to obtain this representation, the sound ﬁeld in the

vicinity of the point of interest must be observed. The pressure

at a particular point is related to the plane-wave density by

the mode strength, which depends on the distance of the point

from the origin and whether a rigid scatterer is present [2], [3],

[30]. Although irregular sampling schemes are possible, for

mathematical convenience we use the pressure on the surface

of a sphere of radius r centered at the origin, p(k, r, Ω), for

which the mode strength can be denoted b

(kr). The SFT of

this function is

(k, r) = B(kr)a

(k) (9)

where B(kr) = diag {b

. . . b

}, p

(k, r) =



1(−1)

. . . p



is a vector of SH coefﬁcients

and the functional dependence of the stacked terms has been

omitted for clarity. Sampling p(k, r, Ω) at Q points with

directions {Ω

}

, the SFT is approximated using the discrete

SFT [4]

(k, r)

∼

Y (Ω)

Wp(k, r) (10)

where p(k, r) = [p

. . . p

]

is the pressure at each of the

sample points, W = diag {w

. . . w

}, where {w

}

are the weights of the sampling scheme, and Y (Ω) is a

Q×(L+1)

matrix deﬁned as in (8) but with the SHs evaluated

at {Ω

}

. For the approximation in (10) to hold up to the

maximum spherical harmonic order, L, requires that there are

sufﬁcient microphones, Q ≥ (L + 1)

, and that they are

adequately distributed over the sphere [31]. Furthermore, for a

given radius, the error in the approximation of (10) increases

with frequency. In practice the upper threshold is commonly

taken as kr < L [7], [12], although to avoid spatial aliasing

requires kr  L [2], [31]. It has also been shown that to

accurately reproduce the pressure at a point due to a plane

wave using the inverse SFT requires a much more conservative

threshold [32].

Equating (9) and (10) the plane-wave density can be ob-

tained as

(k) = B(kr)

−1

Y (Ω)

Wp(kr). (11)

Let x(k, r) = p(k, r) + v(k) be the observation of p(k, r)

in the presence of sensor noise which we assume to be zero-

mean, normally distributed, uncorrelated between sensors and

uncorrelated with s(k). Applying the SFT and compensating

for the mode strength, the observed plane-wave density is

(k) = Y(Ψ)

s(k) +

(k) (12)

where

(k) = B(kr)

−1

Y (Ω)

Wv(k). (13)

III. PSEUDO-INTENSITY VECTOR FORMULATION

The pseudo-intensity vector was proposed in [8] as an

approximation to the active intensity vector. This approach

is reviewed in Sec. III-A while in Sec. III-B an equivalent

vector is derived directly from the SH representation of the

sound ﬁeld. Finally, in Sec. III-C, the SSPIV is formulated.

A. Review of sound intensity and pseudo-intensity

The active intensity vector is deﬁned as the time-averaged

magnitude and direction of the net ﬂow of energy and is given

by [30]

I (k) =



p (k)

∗

u (k)



(14)

where p(k) is the omnidirectional pressure, u (k) =

(k) u

(k)]

is a vector of the particle velocities

in the Cartesian directions and R{·} is the real operator. It is

useful for DOA estimation because acoustic energy ﬂows in the

direction of wave propagation. For a planewave, the particle

velocity vector is related to direction of arrival (θ, φ) as [8]

u (k) = −

p (k)





sin θ cos φ

sin θ sin φ

cos θ





(15)

where ρ

and c are the ambient density and speed of sound

in the medium, respectively. It can be seen that the elements

of u (k) have dipole directivity patterns aligned with the

Cartesian axes and that the resulting vector points in the

opposite direction from the DOA.

A beamformer with a dipole directivity pattern can be

obtained directly from ﬁrst order SH coefﬁcients as

D(k, ϕ, a

(k)) =

m=−1

(ϕ)a

1(m)

(k) (16)

4 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING

where ϕ is the steering direction. Therefore to approximate

(14) using the SH coefﬁcients of the plane-wave density

function the PIV is formulated [8]

I (k) =







(k)

∗





D(k, ϕ

−x

, a

(k))

D(k, ϕ

−y

, a

(k))

D(k, ϕ

−z

, a

(k))











(17)

where ϕ

−x

= (π/2, π), ϕ

−y

= (π/2, −π/2) and ϕ

−z

(π, 0).

B. Alternative formulation of PIV

From (6) the plane-wave decomposition for the n-th plane

wave is a

(n)

(k) = [Y

(Ψ

)]

∗

(k). Expressing the ﬁrst

order coefﬁcients in Cartesian form gives

(n)

1(−1)

(k) = s

(k)

3/8π (x

+ iy

) (18a)

(n)

(k) = s

(k)

3/4πz

(18b)

(n)

(k) = s

(k)

3/8π (−x

+ iy

) . (18c)

where the SHs are evaluated on the unit sphere. Rearranging

(18) gives

(k)x

8π



(n)

1(−1)

(k) − a

(n)

(k)



(19a)

(k)y

8π



(n)

1(−1)

(k) + a

(n)

(k)



(19b)

(k)z

8π

√

(n)

(k). (19c)

which can be interpreted as a weighted sum of the 1-order

plane-wave decomposition coefﬁcients. Moreover, the weight

corresponding to each a

(n)

1(m)

(k) is proportional to the order 1,

degree m SH evaluated in the required axial direction as

(k)$

4π

m=−1

(ϕ

(n)

1(m)

(k) (20)

4π

D(k, ϕ

, a

(n)

(k)) (21)

where (16) has been used to obtain (21), $ ∈ {x, y, z},

= (π/2, 0), ϕ

= (π/2, π/2) and ϕ

= (0, 0). To

obtain a vector pointing towards the n-th DOA, we note that

(n)

(k) =

4π

(k) and evaluate (23) for $ ∈ {x, y, z}

leading to

I(k) =

4π

√

4π











(n)

(k)

∗







D(k, ϕ

, a

(n)

(k))

D(k, ϕ

, a

(n)

(k))

D(k, ϕ

, a

(n)

(k))

















(22)

= R

(k)|

(23)

where, for a single plane wave in noise free conditions, the

argument to the real operator is intrinsically real but the

real operator may be needed in practical implementations

with ﬁnite precision. The direction of the PIV in spherical

coordinates can be extracted using (1) from the unit vector

given by

I(k)/



I(k)



where k·k denotes the `

-norm.

The formulation of (22) is structurally identical to (17), but

I (k) and

I(k) point in opposite directions due to the steering

of the dipoles. Moreover, the inclusion of the 4π

√

4π/3

normalizing constant in (22) leads to the simpliﬁed form of

(23) which will make the notation of the subsequent analysis

more straightforward. For historical reasons

I(k) is hereafter

referred to as the PIV but its orientation towards the DOA is

preferred for simplicity of describing the methods.

C. Subspace PIV

The SSPIV extends the concept of PIVs to take advantage

of higher order SHs and frequency smoothing and is aimed

at providing more accurate and reliable DOA estimates in

the presence of multiple and interfering sound sources and

reverberation. It follows from (7) that the covariance of a

is [12]

= E





(24)

= Y

(Ψ)R

Y(Ψ) (25)

where R

= E





. Singular Value Decomposition (SVD)

leads to

= UΣU

= [U

]



0 Σ





(26)

where U is a unitary matrix, Σ is a diagonal matrix containing

the singular values of R

and U

respectively,

represent the conventional partitioning into signal and noise

subspaces [15]. In the simplest case of a single plane wave,



ˆa

1(−1)

ˆa

. . . ˆa



is a column vector and

is proportional to the steering vector for the plane wave DOA,

y(Ψ

). The SSPIV method applies the PIV method (c.f. (22)

and (16)) to the one-dimensional signal subspace as

4π

√

4π







ˆa

∗





D(k, ϕ

, U

)

D(k, ϕ

, U

)

D(k, ϕ

, U

)











. (27)

to obtain a vector pointing towards the source. Whilst (27)

depends only on the 0 and 1st order components of U

through (25) and (26), their values do depend on the higher

order SH terms of a

. As with the PIV method, the beneﬁt of

this approach is that a direction is obtained for each TF-region

directly, i.e. without evaluating all possible directions. The

implications of violating the assumption that a single plane

wave is present is addressed in Sec. IV.

IV. PIV AND SSPIV DISTRIBUTIONS FOR REPRESENTATIVE

EXAMPLE SOUND FIELDS

As described in Sec. II, an arbitrary sound ﬁeld can be

decomposed into a sum of plane waves. In this section we

consider how the PIVs and SSPIVs are affected by amplitude,

phase and directional relationships between two plane waves.

These simpliﬁed cases provide some insight into the behavior

of pseudo-intensity vectors in real acoustic environments.

A. H. MOORE et al.: DIRECTION OF ARRIVAL ESTIMATION IN THE SH DOMAIN 5

0 0.4 0.8 1.2 1.6 2

-0.6

-0.2

0.2

0.6

135

180

Fig. 1.

I for selected values of |β

− β

| with ﬁxed α

, α

, x

, and x

and α

are shown for reference.

A. Two plane waves - general case

For two plane waves with DOAs given by the unit vectors,

, n = {1, 2}, and source signals s

(k) = α

(k)e

iβ

(k)

where α

(k) and β

(k) are the magnitude and phase at the

origin, respectively, the PIV is obtained from (2) and (22) as

I = R{(s

+ s

)

∗

+ s

)}. (28)

= α

+ α

+ (x

+ x

) α

cos (β

− β

) (29)

where for brevity the dependence on k is assumed. This is

interesting because it implies that the resulting vector lies on

the plane containing the vectors x

and x

but that it does not

necessarily lie between the two. To illustrate this point, Fig. 1

shows

I for various values of |β

− β

|. The resulting vector

is nominally distributed about the direction of the stronger

source (i.e. x

) but is either drawn towards the direction of

the weaker source (i.e. x

) or repelled from it, depending on

the relative amplitudes and phases of the signals.

The SSPIV depends on the SVD of R

, which is determined

primarily by the source covariance, R





where σ

and σ

are the variances of the two plane waves

and σ

, σ

is their covariance. The dimensionality of R

depends on the maximum SH order, L, but is independent of

the number of plane waves.

B. Uncorrelated sources

Consider two uncorrelated sources in a free-ﬁeld with ﬁxed

DOAs and amplitude ratio. We assume that β

and β

are

independent with identical uniform distribution U(0, 2π) such

that ∆β = β

−β

is a triangular distribution over the interval

∆β ∈ [−2π, 2π] which, due to periodicity of the phase,

reduces to ∆β ∈ [−π, π] with probability p (∆β) = 1/(2π).

The expected value of

I is obtained by integrating (29) with

respect to ∆β,

0 30 60 90 120 150 180

Interferer angle [deg]

0.1

0.2

0.3

0.4

0.5

0.6

Error angle [deg]

L = 1

L = 3

L = 7

(a)

0 30 60 90 120 150 180

Interferer angle [deg]

Error angle [deg]

L = 1

L = 3

L = 7

(b)

Fig. 2. Error in E

and

for for L = {1, 3, 7} as function of ∠(x

, x

)

with (a) SIR 20 dB and (b) SIR 3 dB.

−π

Ip (∆β) d∆β (30)

= α

+ α

+ x

) α

2π

−π

cos (β

− β

) d∆β (31)

= α

+ α

. (32)

The SSPIV is determined by the source covariance, R

∝



0 α



, the DOAs and the maximum order of SHs con-

sidered. Without loss of generality, let x

point in the direction

of the desired source such that the Signal-to-Interference Ratio

(SIR) in dB is 10 log

(α

/α

) ≥ 0. Figure 2 shows the error

angle ∠(x

) as a function of the interferer angle ∠(x

, x

)

for SIRs of 20 dB and 3 dB and for different values of L. These

plots were produced by collating, without averaging, SSPIVs

calculated according to (25), (26) and (27) for interferers

at 794 approximately equally distributed directions and 5

random target directions. The variation in error as a function of

interferer angle has multiple peaks and nulls corresponding to

the number of lobes in the real (or imaginary) part of highest

order SH considered but is independent of the target direction.

Increasing L reduces the worst case error, which conﬁrms that

higher order SHs are being utilized by the SSPIV. Also shown is

∠(x

, E

) where E

is calculated according to (32).

Figure 2(a) shows that PIVs and SSPIVs are both accurate to

Direction of Arrival Estimation in the Spherical Harmonic Domain Using Subspace Pseudointensity Vectors

Figures

Citations

The LOCATA Challenge: Acoustic Source Localization and Tracking

The LOCATA Challenge Data Corpus for Acoustic Source Localization and Tracking

Acoustic SLAM

Sound Localization Based on Phase Difference Enhancement Using Deep Neural Networks

Novel application of FO-DPSO for 2-D parameter estimation of electromagnetic plane waves

References

Multiple emitter location and signal parameter estimation

Image method for efficiently simulating small‐room acoustics

Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST

TIMIT Acoustic-Phonetic Continuous Speech Corpus

Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources

Related Papers (5)

Localization of multiple speakers under high reverberation using a spherical microphone array and the direct-path dominance test

Multiple emitter location and signal parameter estimation

3D source localization in the spherical harmonic domain using a pseudointensity vector

Image method for efficiently simulating small‐room acoustics

ESPRIT-estimation of signal parameters via rotational invariance techniques

Frequently Asked Questions (18)

Q1. What are the contributions mentioned in the paper "Direction of arrival estimation in the spherical harmonic domain using subspace pseudo-intensity vectors" ?

Q2. What is the sh domain representation of a sound field?

Q3. what is the sh representation of a sound field?

Q4. What are the parameters specific to the proposed method?

Q5. What is the computational cost of calculating SSPIV?

Q6. What is the corresponding spatial cost function for the DOA estimation method?

Q7. What is the way to estimate the DOA of moving sources?

Q8. What is the effect of the DPD test on the noise subspace?

Q9. How many inclinations did the speakers have to be arranged?

Q10. What is the maximum directivity index of the beamformer?

Q11. How many spherical microphones were used to record speech?

Q12. What is the error angle of cos?

Q13. What is the effect of estimation errors in the spatial covariance matrices?

Q14. What is the effect of DPD-MUSIC on the miss rate?

Q15. How many s did it take to compute for grid resolutions 10, 5?

Q16. What is the definition of a sparse dictionary?

Q17. How many DOAs were estimated for each trial?

Q18. How many sources were recorded in the second scenario?