(Open Access) Joint Late Reverberation and Noise Power Spectral Density Estimation in a Spatially Homogeneous Noise Field (2018) | Ina Kodrasi

JOINT LATE REVERBERATION AND NOISE POWER SPECTRAL DENSITY ESTIMATION

IN A SPATIALLY HOMOGENEOUS NOISE FIELD

Ina Kodrasi

?†

, Simon Doclo

University of Oldenburg, Department of Medical Physics and Acoustics

and Cluster of Excellence Hearing4All, Oldenburg, Germany

†

Idiap Research Institute, Speech and Audio Processing Group, Martigny, Switzerland

{ina.kodrasi,simon.doclo}@uni-oldenburg.de

ABSTRACT

Many multi-channel dereverberation and noise reduction techniques

such as the multi-channel Wiener ﬁlter (MWF) require an estimate

of the late reverberation and noise power spectral densities (PSDs).

State-of-the-art multi-channel methods for estimating the late rever-

beration PSD typically assume that the noise PSD matrix is known.

Instead of assuming that the noise PSD matrix is known, in this pa-

per we model the noise as a spatially homogeneous sound ﬁeld with

an unknown time-varying PSD and a known time-invariant spatial

coherence matrix. Based on this model, two joint estimators of the

late reverberation and noise PSDs are proposed, i.e., a non-blocking-

based estimator which simultaneously estimates the target signal,

late reverberation, and noise PSDs, and a blocking-based estima-

tor which ﬁrst estimates the late reverberation and noise PSDs at the

output of a blocking matrix aiming to block the target signal. Ex-

perimental results show that the proposed blocking-based estimator

yields the best performance when used in an MWF, even resulting in

a similar or better performance than a state-of-the-art blocking-based

estimator of the late reverberation PSD which assumes that the noise

PSD matrix is known.

Index Terms— PSD estimation, late reverberation, noise,

MWF, least-squares

1. INTRODUCTION

In many hands-free speech communication applications, the recorded

microphone signals do not only contain the desired speech signal,

but also attenuated and delayed copies of the desired speech signal

due to reverberation, as well as additive noise. While early reverber-

ation may be desirable [1], late reverberation and noise may degrade

the perceived quality and hinder the intelligibility of speech [2, 3].

Hence, effective dereverberation and noise reduction techniques are

required.

A commonly used dereverberation and noise reduction tech-

nique is the multi-channel Wiener ﬁlter (MWF), which aims at

minimizing the mean-square error between the output signal and

the target signal [4–6]. The implementation of the MWF requires

(among other parameters) an estimate of the late reverberation

and noise power spectral densities (PSDs). To estimate the late

reverberation PSD, several single-channel estimators based on a

temporal model of reverberation [7–9] as well as multi-channel

estimators based on a diffuse sound ﬁeld model for the late rever-

beration [10–18] have been proposed. To the best of our knowledge,

state-of-the-art multi-channel late reverberation PSD estimators es-

timate the late reverberation PSD assuming that an estimate of the

This work was supported by the Cluster of Excellence Hearing4All,

funded by the German Research Foundation (DFG), and the joint Lower

Saxony-Israeli Project ATHENA, funded by the State of Lower Saxony.

noise PSD matrix is available. The noise PSD matrix is typically es-

timated from the microphone signals during speech pauses detected

by means of a voice activity detector (VAD) [19, 20], generally

requiring the noise PSD to be rather time-invariant. However, in

many acoustic scenarios, e.g., in highly reverberant environments,

speech pauses may rarely occur, making the estimation of the noise

PSD matrix challenging. In addition, in many acoustic scenarios

the noise PSD can be time-varying, e.g., when the noise consists of

microphone self-noise in a system with the input gain automatically

adjusted during operation using an automatic gain control.

Instead of assuming that an estimate of the noise PSD matrix

is available, in this paper we model the noise as a spatially homo-

geneous sound ﬁeld with a time-varying PSD and assume that only

knowledge of the time-invariant spatial coherence matrix is avail-

able. Two alternative joint estimators of the late reverberation and

noise PSDs are proposed, i.e., a non-blocking-based estimator which

simultaneously estimates the target signal, late reverberation, and

noise PSDs, and a blocking-based estimator which ﬁrst estimates

only the late reverberation and noise PSDs at the output of a block-

ing matrix aiming to block the target signal. The proposed PSD es-

timators can be viewed as extensions of the PSD estimators in [10]

and [16], where only the target signal and late reverberation PSDs

are estimated assuming that an estimate of the noise PSD matrix is

available. Simulation results for several realistic acoustic scenarios

show that the proposed blocking-based PSD estimator yields the best

performance when used in an MWF, also yielding a similar or better

performance than the PSD estimator in [10] which assumes that the

noise PSD matrix is known.

2. SIGNAL MODEL AND ASSUMPTIONS

Consider a reverberant and noisy multi-channel acoustic system

with a single speech source and M microphones. In the short-time

Fourier transform (STFT) domain, the M-dimensional vector of the

received microphone signals y(k, l) = [Y

(k, l) . . . Y

(k, l)]

frequency bin k and frame index l is given by

y(k, l) = x

(k, l) + x

(k, l)

| {z }

x(k,l)

+v(k, l), (1)

with x

(k, l) the direct and early reverberation component, x

(k, l)

the late reverberation component, x(k, l) the reverberant com-

ponent, and v(k, l) the noise component. The vectors x

(k, l),

(k, l), x(k, l), and v(k, l) are deﬁned similarly as y(k, l). The

direct and early reverberation component x

(k, l) can be expressed

(k, l) = S(k, l)d(k), (2)

with S(k, l) the target signal, i.e., the direct and early reverbera-

tion component received at the reference microphone, and d(k) =

(k) . . . D

(k)]

the M-dimensional vector of relative trans-

fer functions (RTFs) of the target signal between the reference mi-

crophone and all microphones. The target signal S(k, l) is often

deﬁned as the direct component only, such that the RTF vector d(k)

only depends on the direction of arrival (DOA) of the speech source

and the microphone array geometry [10–14, 16]. For conciseness,

the frequency index k is omitted in the remainder of this paper.

Assuming that the components in (1) are mutually uncorrelated,

the PSD matrix of the microphone signals is equal to

(l) = E{y(l)y

(l)} = Φ

(l) + Φ

(l)

{z }

(l)

+Φ

(l), (3)

where E denotes the expectation operator, Φ

(l) is the direct and

early reverberation PSD matrix, Φ

(l) is the late reverberation PSD

matrix, Φ

(l) is the reverberant PSD matrix, and Φ

(l) is the noise

PSD matrix. The PSD matrix Φ

(l) can be expressed as (cf. (2))

(l) = Φ

(l)dd

, (4)

with Φ

(l) the time-varying PSD of the target signal, i.e., Φ

(l) =

E{|S(l)|

}. Modeling the late reverberation as a diffuse sound

ﬁeld [10–18], the PSD matrix Φ

(l) can be expressed as

(l) = Φ

(l)Γ, (5)

with Φ

(l) the time-varying PSD of the late reverberation and Γ the

spatial coherence matrix of a diffuse sound ﬁeld, which can be an-

alytically computed based on the microphone array geometry [21].

Modeling the additive noise as a spatially homogeneous sound ﬁeld,

the noise PSD matrix Φ

(l) can be expressed as

(l) = Φ

(l)Ψ, (6)

with Φ

(l) the time-varying noise PSD and Ψ the spatial coher-

ence matrix of the noise, which is assumed to be time-invariant. In

the presence of spatially uncorrelated noise (e.g., microphone self-

noise), Ψ = I, with I the M × M-dimensional identity matrix.

Using (4), (5), and (6), the PSD matrix Φ

(l) is equal to

(l) = Φ

(l)dd

+ Φ

(l)Γ + Φ

(l)Ψ. (7)

Given the ﬁlter vector w(l) = [W

(l) . . . W

(l)]

, the output

signal Z(l) of the speech enhancement system is equal to the sum

of the ﬁltered microphone signals, i.e., Z(l) = w

(l)y(l). Dere-

verberation and noise reduction techniques aim at designing the ﬁl-

ter w(l) such that the output signal Z(l) is as close as possible to

the target signal S(l). A widely used dereverberation and noise re-

duction technique is the MWF, which aims at minimizing the mean-

square error between Z(l) and S(l) [4–6]. The MWF is typically im-

plemented as a minimum variance distortionless response (MVDR)

beamformer w

MVDR

(l) followed by a single-channel Wiener postﬁl-

ter G(l) [10,12–18], i.e.,

MWF

(l) =

[

(l)Γ +

(l)Ψ]

−1

[

(l)Γ +

(l)Ψ]

−1

{z }

MVDR

(l)

ˆρ(l)

1 + ˆρ(l)

| {z }

G(l)

, (8)

with

(l) and

(l) denoting the estimated late reverberation and

noise PSDs respectively and ˆρ(l) denoting the estimated target-to-

late reverberation and noise ratio (TRNR) at the output of the MVDR

beamformer. The TRNR can be estimated as

ˆρ(l) =

(l)

, (9)

with

(l) denoting the estimated target signal PSD and

(l) =

[

(l)Γ +

(l)Ψ]

−1

the estimated residual late rever-

beration and noise PSD at the output of the MVDR beamformer.

Alternatively, ˆρ(l) can be estimated using the decision directed ap-

proach as [16, 22]

ˆρ

(l) = β

|Z(l − 1)|

(l − 1)

+ (1 − β)

(l)

, (10)

with β a smoothing parameter. As can be observed in (8), (9),

and (10), the implementation of the MWF requires estimates of the

time-varying target signal, late reverberation, and noise PSDs. The

objective of this paper is to derive estimates

(l),

(l), and

(l),

assuming that the RTF vector d, the diffuse spatial coherence matrix

Γ, and the noise spatial coherence matrix Ψ are known. The RTF

vector can be constructed based on a DOA estimate, the diffuse spa-

tial coherence matrix can be constructed based on the microphone

array geometry, and the noise spatial coherence matrix can be con-

structed assuming a reasonable sound ﬁeld model for the noise.

3. JOINT TARGET SIGNAL, LATE REVERBERATION,

AND NOISE PSD ESTIMATORS

To the best of our knowledge, state-of-the-art multi-channel PSD

estimators do not explicitly model the noise as a spatially homoge-

neous sound ﬁeld and only derive target signal and late reverberation

PSDs estimates

(l) and

(l) assuming that an estimate of the

noise PSD matrix Φ

(l) is available [10–18]. The noise PSD matrix

is typically estimated from the microphone signals during speech

pauses detected by means of a VAD [19,20], generally requiring the

noise PSD Φ

(l) to be time-invariant. Instead of assuming that an

estimate of the noise PSD matrix Φ

(l) is available, in this paper

we assume that only knowledge of the noise spatial coherence ma-

trix Ψ is available and propose a non-blocking-based and a blocking-

based estimator of the target signal PSD Φ

(l), the late reverberation

PSD Φ

(l), and the noise PSD Φ

(l). The proposed PSD estimators

can be viewed as extensions of the PSD estimators in [10] and [16],

where only estimates of Φ

(l) and Φ

(l) are derived assuming that

the noise PSD matrix Φ

(l) is known.

3.1. Non-blocking-based PSD estimator

In the following we propose to simultaneously estimate the target

signal, late reverberation, and noise PSDs using the signal model

in (7) and an estimate of the PSD matrix Φ

(l). An estimate of

(l) can be directly obtained from the microphone signals using

recursive averaging as

(l) = αy(l)y

(l) + (1 − α)

(l − 1), (11)

with α a smoothing factor. Matching (11) to (7) and since the matri-

ces dd

, Γ, and Ψ are known, a system of M(M + 1)/2 equations

with three unknowns Φ

(l), Φ

(l), and Φ

(l) arises

. For M ≥ 3,

the system of equations is overdetermined and an estimate of the un-

known PSDs Φ

(l), Φ

(l), and Φ

(l) can be obtained by minimizing

the least-squares cost function

(l) = k

(l) − Φ

(l)dd

− Φ

(l)Γ − Φ

(l)Ψk

, (12)

Note that since the matrices

(l), dd

, Γ, and Ψ are symmetric,

matching (11) to (7) yields M(M +1)/2 equations instead of M

equations.

Note that this non-blocking-based least-squares cost function has already

been used in [23] in the context of noise reduction only, in order to estimate

the PSDs of different spatially homogeneous noise ﬁelds.

with k · k

the matrix Frobenius norm. Setting the derivative of (12)

with respect to Φ

(l), Φ

(l), and Φ

(l) to 0 results in a system of

equations which can be written as





Γd d

Ψd

Γd tr{Γ

Γ} tr{Γ

Ψ}

Ψd tr{Γ

Ψ} tr{Ψ

Ψ}





{z }





s,n

(l)

r,n

(l)

v,n

(l)





| {z }

(l)





(l)d

tr{

(l)Γ}

tr{

(l)Ψ}





| {z }

(l)

(13)

where tr{·} denotes the trace operator and the quantities A

(l),

and p

(l) have been introduced in order to simplify the notation.

The solution to (13) is given by

(l) = A

−1

(l), (14)

with the proposed target signal PSD estimate

s,n

(l) being the ﬁrst

element of

(l), late reverberation PSD estimate

r,n

(l) being the

second element of

(l), and noise PSD estimate

v,n

(l) being the

third element of

(l).

3.2. Blocking-based PSD estimator

In the following we propose an alternative PSD estimator which ﬁrst

estimates the late reverberation and noise PSDs using reference sig-

nals at the output of a blocking matrix aiming to block the target

signal. Based on the estimated late reverberation and noise PSDs,

the target signal PSD is then estimated in a second step.

In order to block the target signal, an M ×(M − 1)-dimensional

blocking matrix B is constructed such that

d = 0, (15)

and a set of M − 1 reference signals

u(l) containing only late re-

verberation and noise is generated as

u(l) = B

y(l). There exist

many blocking matrices which satisfy (15). In this paper, the block-

ing matrix is computed from the ﬁrst M − 1 columns of the matrix

T deﬁned as

T = I −

kdk

. (16)

Based on (7) and (15), the PSD matrix of the reference signals at the

blocking matrix output can be expressed as

(l) = E{

u(l)

(l)} = Φ

(l) B

ΓB

| {z }

+Φ

(l) B

ΨB

| {z }

. (17)

The matrices

Γ and

Ψ can be computed using the known spatial co-

herence matrices Γ and Ψ and an estimate

(l) of the PSD matrix

(l) can be directly obtained from the reference signals similarly

to (11). Matching the estimated PSD matrix

(l) to (17) gives rise

to a system of M(M − 1)/2 equations with two unknowns Φ

(l)

and Φ

(l)

. For M ≥ 3, the system of equations is overdetermined

and an estimate of Φ

(l) and Φ

(l) can be obtained by minimizing

the least-squares cost function

(l) = k

(l) − Φ

(l)

Γ − Φ

(l)

Ψk

. (18)

Setting the derivative of (18) with respect to Φ

(l) and Φ

(l) to 0

yields a system of equations which can be written as

tr{

Γ} tr{

Ψ}

tr{

Ψ} tr{

Ψ}

| {z }



r,b

(l)

v,b

(l)



| {z }

(l)



tr{

(l)

Γ}

tr{

(l)

Ψ}



| {z }

(l)

, (19)

Note that since the matrices

(l),

Γ, and

Ψ are symmetric, matching

(l) to (7) yields M(M − 1)/2 equations instead of (M − 1)

equations.

where the quantities A

(l), and p

(l) have been introduced in

order to simplify the notation. The solution to (19) is given by

(l) = A

−1

(l), (20)

with the proposed blocking-based late reverberation PSD estimate

r,b

(l) being the ﬁrst element of

(l) and the noise PSD estimate

v,b

(l) being the second element of

(l). Using the late reverber-

ation and noise PSD estimates

r,b

(l) and

v,b

(l), the blocking-

based target signal PSD can be estimated as

s,b

(l) =

tr{

(l) −

r,b

(l)Γ −

v,b

(l)Ψ}. (21)

It should be noted that if the signal model in (7) perfectly holds,

the non-blocking-based estimator proposed in Section 3.1 and the

blocking-based estimator proposed in this section would result in

the same PSD estimates. In practice however, the signal model in (7)

does not perfectly hold since the early and late reverberation com-

ponents are not perfectly uncorrelated, the late reverberation is not

perfectly diffuse, and the noise cannot be typically perfectly modeled

by a spatially homogeneous sound ﬁeld. Furthermore, estimating the

matrices Φ

(l) and Φ

(l) by recursive averaging of a single real-

ization of the signals will not yield the expected value operator. As

a result, the proposed PSD estimators yield different PSD estimates

in practice. As will be shown in Section 4, using the blocking-based

PSD estimates in an MWF yields a better performance than using

the non-blocking-based PSD estimates.

4. EXPERIMENTAL RESULTS

In this section, we investigate the dereverberation and noise reduc-

tion performance of the MWF using the proposed PSD estimators

and two alternative versions to compute the TRNR. More precisely,

we investigate the performance of the MWF implemented using

• the proposed non-blocking-based estimator with the TRNR

estimated as in (9), which will be referred to as NBB,

• the proposed non-blocking-based estimator with the TRNR

estimated as in (10), which will be referred to as NBB-DD,

• the proposed blocking-based estimator with the TRNR esti-

mated as in (9), which will be referred to as BB, and

• the proposed blocking-based estimator with the TRNR esti-

mated as in (10), which will be referred to as BB-DD.

In addition, the performance of the BB and BB-DD methods will be

compared to the performance of the MWF implemented using the

target signal and late reverberation PSD estimates from [10], where

it is assumed that an estimate of the noise PSD matrix is available.

4.1. Setup and instrumental measures

We consider three multi-channel acoustic systems with a single

speech source and M = 4 microphones. The ﬁrst acoustic system

consists of a linear microphone array with an inter-sensor distance

of 3 cm [24], the second acoustic system consists of a circular mi-

crophone array with a radius of 10 cm [25], and the third acoustic

system consists of a linear microphone array with an inter-sensor

distance of 6 cm [26]. Table 1 presents the reverberation time T

the DOA θ of the speech source, and the direct-to-reverberation ratio

(DRR) for each acoustic system. The speech components are gen-

erated by convolving a 38 s long clean speech signal with measured

room impulse responses at a sampling frequency f

= 16 kHz. The

noise components consist of stationary uncorrelated noise with a

broadband reverberant signal-to-noise ratio (RSNR) between 10 dB

and 40 dB. The reverberant speech-plus-noise signal is preceded

Table 1: Characteristics of the considered acoustic systems.

Acoustic system T

[s] θ DRR [dB]

1 0.61 90

◦

−0.76

2 0.73 45

◦

1.43

3 1.25 −15

◦

−0.04

by a 1 s long noise-only segment such that when using the PSD

estimator from [10], the noise PSD matrix can be estimated from

the noise-only segment. The signals are processed using a weighted

overlap-add STFT framework with a frame size of 1024 samples

and an overlap of 75%. The ﬁrst microphone is arbitrarily selected

as the reference microphone. The target signal is deﬁned as the

direct component only, such that the RTF vector can be computed

based on the DOA of the speech source.

The PSD matrices

(l) and

(l) are estimated as in (11) with

a smoothing factor α corresponding to a time constant of 40 ms.

The diffuse spatial coherence matrix Γ is computed based on the

microphone array geometry and the noise spatial coherence matrix

is set to Ψ = I. The smoothing parameter in (10) is set to β = 0.98

and the minimum gain of the single-channel Wiener postﬁlter is set

to −17 dB. For the estimator from [10], the noise PSD matrix Φ

estimated as

l=1

v(l)v

(l), (22)

with L

being the total number of noise-only segments.

The performance is evaluated in terms of the improvement

in frequency-weighted segmental SNR (∆fwSSNR) [27] and log-

likelihood ratio (∆LLR) [27] between the output signal and the

reference microphone signal. The fwSSNR and LLR measures are

intrusive measures comparing the signal being evaluated to a refer-

ence signal. The reference signal used in this paper is the anechoic

speech signal. It should be noted that a positive ∆fwSSNR and a

negative ∆LLR indicate a performance improvement.

4.2. Performance of the proposed estimators

In this section the performance of NBB, NBB-DD, BB, and BB-DD

is investigated for all considered RSNRs and acoustic systems. The

presented performance measures are averaged over all considered

acoustic systems.

Fig. 1 depicts the performance of all considered techniques in

terms of ∆fwSSNR and ∆LLR. It can be observed that, as ex-

pected, for all considered techniques the performance improvement

decreases as the RSNR increases. Furthermore, it can be observed

that in terms of both performance measures and for all considered

10 20 30 40

RSNR [dB]

∆fwSSNR [dB]

10 20 30 40

-0.5

-0.4

-0.3

-0.2

-0.1

RSNR [dB]

∆LLR [dB]

NBB NBB-DD BB BB-DD

(a) (b)

Fig. 1: MWF performance using the proposed PSD estimators.

Table 2: Average performance of the MWF using the proposed

blocking-based estimator and the estimator from [10] which assumes

that the noise PSD matrix is known (RSNR = 10 dB).

BB BR BB-DD BR-DD

∆fwSSNR [dB] 6.47 5.31 7.07 6.53

∆LLR [dB] −0.41 −0.33 −0.36 −0.31

RSNRs, a larger performance improvement is obtained when using

NBB-DD instead of NBB, suggesting that smoothing the TRNR es-

timate using the decision directed approach is particularly important

when using non-blocking-based PSD estimates. In addition, it can

be observed that BB and BB-DD outperform NBB and NBB-DD for

all considered RSNRs. While BB-DD yields the highest ∆fwSSNR,

BB results in the highest ∆LLR. Informal listening tests suggest

that BB-DD yields a better perceptual quality than BB, with BB

introducing more musical noise and signal artifacts than BB-DD.

In summary, the presented results show that for the considered

acoustic scenarios, the proposed blocking-based PSD estimates yield

a better performance than the non-blocking-based PSD estimates.

4.3. Performance of the proposed blocking-based estimator and

the state-of-the-art estimator from [10]

In this section, the performance of BB and BB-DD is compared to

the performance of the estimator from [10], which uses a blocking

matrix and only estimates the target signal and late reverberation

PSDs, assuming that an estimate of the noise PSD matrix is avail-

able. The noise PSD matrix is estimated as in (22) and the MWF is

implemented using

(instead of

(l)Ψ in (8)) with the TRNR

estimated as in (9) or (10). Using [10] with the TRNR estimated as

in (9) will be referred to as BR, whereas using [10] with the TRNR

estimated as in (10) will be referred to as BR-DD. Due to space con-

straints, only the performance for RSNR = 10 dB is presented and

similarly as before, the performance is averaged over all considered

acoustic systems.

Table 2 depicts the performance of the considered techniques in

terms of ∆fwSSNR and ∆LLR. It can be observed that BB and BB-

DD result in a similar or better performance than BR and BR-DD,

respectively. It should be noted that the noise PSD matrix estimate

used for BR and BR-DD is rather accurate, since the noise is station-

ary and all noise-only segments are used to compute the PSD matrix.

The presented results show that the proposed blocking-based estima-

tor manages to remove the assumption that the noise PSD matrix is

known and additionally estimates the noise PSD without hindering

the dereverberation and noise reduction performance.

5. CONCLUSION

In this paper joint estimators for the late reverberation and noise

PSDs have been derived, removing the assumption made by state-

of-the-art late reverberation PSD estimators that the noise PSD ma-

trix is known. Modeling the noise as a spatially homogeneous sound

ﬁeld with an unknown time-varying PSD and a known time-invariant

spatial coherence matrix, we have derived a non-blocking-based and

a blocking-based joint estimator of the late reverberation and noise

PSDs. Simulation results show that the proposed blocking-based

PSD estimator yields the best performance when used in an MWF,

also yielding a similar or better performance than a state-of-the-art

blocking-based late reverberation PSD estimator which assumes that

the noise PSD matrix is known.

6. REFERENCES

[1] J. S. Bradley, H. Sato, and M. Picard, “On the importance of

early reﬂections for speech in rooms,” Journal of the Acous-

tical Society of America, vol. 113, no. 6, pp. 3233–3244, June

2003.

[2] R. Beutelmann and T. Brand, “Prediction of speech intelligi-

bility in spatial noise and reverberation for normal-hearing and

hearing-impaired listeners,” Journal of the Acoustical Society

of America, vol. 120, no. 1, pp. 331–342, July 2006.

[3] A. Warzybok, I. Kodrasi, J. O. Jungmann, E. A. P. Ha-

bets, T. Gerkmann, A. Mertins, S. Doclo, B. Kollmeier, and

S. Goetze, “Subjective speech quality and speech intelligibil-

ity evaluation of single-channel dereverberation algorithms,”

in Proc. International Workshop on Acoustic Echo and Noise

Control, Antibes, France, Sept. 2014, pp. 333–337.

[4] S. Doclo and M. Moonen, “Combined frequency-domain

dereverberation and noise reduction technique for multi-

microphone speech enhancement,” in Proc. International

Workshop on Acoustic Echo and Noise Control, Darmstadt,

Germany, Sept. 2001, pp. 31–34.

[5] E. A. P. Habets and J. Benesty, “A two-stage beamforming ap-

proach for noise reduction and dereverberation,” IEEE Trans-

actions on Audio, Speech, and Language Processing, vol. 21,

no. 5, pp. 945–958, May 2013.

[6] B. Cauchi, I. Kodrasi, R. Rehr, S. Gerlach, A. Juki

c, T. Gerk-

mann, S. Doclo, and S. Goetze, “Combination of MVDR

beamforming and single-channel spectral processing for en-

hancing noisy and reverberant speech,” EURASIP Journal on

Advances in Signal Processing, vol. 2015, no. 1, 2015.

[7] K. Lebart and J. M. Boucher, “A new method based on spectral

subtraction for speech dereverberation,” Acta Acoustica, vol.

87, no. 3, pp. 359–366, May-Jun. 2001.

[8] E. A. P. Habets, S. Gannot, and I. Cohen, “Late reverber-

ant spectral variance estimation based on a statistical model,”

IEEE Signal Processing Letters, vol. 16, no. 9, pp. 770–774,

Sept. 2009.

[9] S. Braun, B. Schwartz, S. Gannot, and E. A. P. Habets, “Late

reverberation PSD estimation for single-channel dereverbera-

tion using relative convolutive transfer functions,” in Proc.

International Workshop on Acoustic Echo and Noise Control,

Xi’an, China, Sept. 2016.

[10] S. Braun and E. A. P. Habets, “Dereverberation in noisy en-

vironments using reference signals and a maximum likelihood

estimator,” in Proc. European Signal Processing Conference,

Marrakech, Morocco, Sept. 2013.

[11] O. Thiergart and E. A. P. Habets, “Extracting reverberant sound

using a linearly constrained minimum variance spatial ﬁlter,”

IEEE Signal Processing Letters, vol. 21, no. 5, pp. 630–634,

May 2014.

[12] S. Braun and E. A. P. Habets, “A multichannel diffuse

power estimator for dereverberation in the presence of multi-

ple sources,” EURASIP Journal on Applied Signal Processing,

vol. 2015, no. 1, Dec. 2015.

[13] O. Schwartz, S. Braun, S. Gannot, and E. A. P. Habets, “Maxi-

mum likelihood estimation of the late reverberant power spec-

tral density in noisy environments,” in Proc. IEEE Workshop

on Applications of Signal Processing to Audio and Acoustics,

New York, USA, Oct. 2015.

[14] O. Schwartz, S. Gannot, and E. A. P. Habets, “Joint maximum

likelihood estimation of late reverberant and speech power

spectral density in noisy environments,” in Proc. IEEE Inter-

national Conference on Acoustics, Speech, and Signal Process-

ing, Shanghai, China, Mar. 2016, pp. 151–155.

[15] A. Kuklasi

nski, S. Doclo, S. H. Jensen, and J. Jensen, “Max-

imum likelihood PSD estimation for speech enhancement in

reverberation and noise,” IEEE/ACM Transactions on Audio,

Speech, and Language Processing, vol. 24, no. 9, pp. 1595–

1608, Sept. 2016.

[16] O. Schwartz, S. Gannot, and E. A. P. Habets, “Joint estima-

tion of late reverberant and speech power spectral densities in

noisy environments using Frobenius norm,” in Proc. European

Signal Processing Conference, Budapest, Hungary, Sept. 2016,

pp. 1123–1127.

[17] I. Kodrasi and S. Doclo, “Late reverberant power spectral den-

sity estimation based on an eigenvalue decomposition,” in

Proc. IEEE International Conference on Acoustics, Speech,

and Signal Processing, New Orleans, USA, Mar. 2017, pp.

611–615.

[18] I. Kodrasi and S. Doclo, “Multi-channel late reverberation

power spectral density estimation based on nuclear norm min-

imization,” in Proc. IEEE Workshop on Applications of Sig-

nal Processing to Audio and Acoustics, New York, USA, Oct.

2017, pp. 101–105.

[19] J. Rami

rez, J. C. Segura, C. Ben

ıtez,

A. de la Torre, and A. Ru-

bio, “Efﬁcient voice activity detection algorithms using long-

term speech information,” Speech Communication, vol. 42, no.

3, pp. 271–287, Apr. 2004.

[20] K. Ishizuka, T. Nakatani, M. Fujimoto, and N. Miyazaki,

“Noise robust voice activity detection based on periodic to ape-

riodic component ratio,” Speech Communication, vol. 52, no.

1, pp. 41–60, Jan. 2010.

[21] B. F. Cron and C. H. Sherman, “Spatial-correlation functions

for various noise models,” The Journal of the Acoustical Soci-

ety of America, vol. 34, no. 11, pp. 1732–1736, Nov. 1962.

[22] Y. Ephraim and D. Malah, “Speech enhancement using a min-

imum mean-square error short-time spectral amplitude estima-

tor,” IEEE Transactions on Acoustics, Speech and Signal Pro-

cessing, vol. 32, no. 6, pp. 1109–1121, Dec. 1984.

[23] Y. A. Huang, A. Luebs, J. Skoglund, and W. B. Kleijn, “Glob-

ally optimized least-squares post-ﬁltering for microphone ar-

ray speech enhancement,” in Proc. IEEE International Confer-

ence on Acoustics, Speech, and Signal Processing, Shanghai,

China, Mar. 2016, pp. 380–384.

[24] E. Hadad, F. Heese, P. Vary, and S. Gannot, “Multichannel

audio database in various acoustic environments,” in Proc.

International Workshop on Acoustic Echo and Noise Control,

Antibes, France, Sept. 2014, pp. 313–317.

[25] K. Kinoshita, M. Delcroix, S. Gannot, E. A. P. Habets,

R. Haeb-Umbach, W. Kellermann, V. Leutnant, R. Maas,

T. Nakatani, B. Raj, A. Sehr, and T. Yoshioka, “A summary of

the REVERB challenge: state-of-the-art and remaining chal-

lenges in reverberant speech processing research,” EURASIP

Journal on Advances in Signal Processing, vol. 2016, no. 1,

Jan. 2016.

[26] J. Eaton, N. D. Gaubitch, A. H. Moore, and P. A. Naylor, “The

ACE challenge - Corpus description and performance evalua-

tion,” in Proc. IEEE Workshop on Applications of Signal Pro-

cessing to Audio and Acoustics, New York, USA, Oct. 2015.

[27] S. Quackenbush, T. Barnwell, and M. Clements, Objective

measures of speech quality, Prentice-Hall, New Jersey, USA,

1988.

Joint Late Reverberation and Noise Power Spectral Density Estimation in a Spatially Homogeneous Noise Field

Figures

Citations

Direction and Reverberation Preserving Noise Reduction of Ambisonics Signals

Square Root-Based Multi-Source Early PSD Estimation and Recursive RETF Update in Reverberant Environments by Means of the Orthogonal Procrustes Problem

Joint Estimation of RETF Vector and Power Spectral Densities for Speech Enhancement Based on Alternating Least Squares

Improved Distributed Minimum Variance Distortionless Response (MVDR) Beamforming Method Based on a Local Average Consensus Algorithm for Bird Audio Enhancement in Wireless Acoustic Sensor Networks

Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.

References

Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator

Speech enhancement using a minimum mean square error short-time spectral amplitude estimator

Objective measures of speech quality

Efficient voice activity detection algorithms using long-term speech information

A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research

Related Papers (5)

Cramér-Rao Bound Analysis of Reverberation Level Estimators for Dereverberation and Noise Reduction

Nonstationary Noise PSD Matrix Estimation for Multichannel Blind Speech Extraction

Multi-channel Time-Varying Covariance Matrix Model for Late Reverberation Reduction

Dual channel reduction of rapidly varying harmonic and random noise using a spot microphone

Sparse spatial spectral estimation in directional noise environment

Frequently Asked Questions (1)

Q1. What contributions have the authors mentioned in the paper "Joint late reverberation and noise power spectral density estimation in a spatially homogeneous noise field" ?