A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
read more
Citations
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
Auditory Scene Analysis: The Perceptual Organization of Sound by Albert Bregman (review)
Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation
Acoustic beamforming for noise source localization - Reviews, methodology and applications
Machine learning in acoustics: theory and applications
References
Maximum likelihood from incomplete data via the EM algorithm
Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning (Information Science and Statistics)
Independent component analysis, a new concept?
Related Papers (5)
Frequently Asked Questions (14)
Q2. What is the way to implement time-domain filtering?
Time-domain filtering can be exactly implemented in the frequency domain using overlap and save techniques [69], [100], provided that the analysis frame-length is larger than the filter length.
Q3. Why is the estimation of the full-rank model more difficult?
Due to the increased number of parameters, the estimation of this model is more difficult, especially when the number of microphones The authoris large.
Q4. What is the advantage of a semi-fixed beamforming approach?
A semi-fixed beamforming approach, suitable for cases when the position of the target source cannot be determined in advance, is to estimate its DOA and to design a FBF steered towards it.
Q5. What is the way to minimize the noise variance at the output of the beamformer?
It is suggested to minimize the noise variance at the output of the beamformer while constraining the maximal distortion incurred to the speech signal, denoted σ2D.
Q6. What is the dereverberation requirement for a beamformer?
Assuming that reverberation alone does not compromise intelligibility, which is the case in many scenarios, the dereverberation requirement can be relaxed.
Q7. What is the way to measure the sound pressure?
A device that can directly measure the sound velocity, i.e. the first-order vector derivative of the sound pressure, is also available [128].
Q8. Why have these models been little used in practice?
These models have been little used in practice, due to the potentially large number of STFT domain filter coefficients to be estimated.
Q9. What can be done to design a matched-filter FBF?
the AIRs or the RTFs between the target source position and the microphones can be estimated during a calibration process and used to construct a matched-filter FBF [139].
Q10. What is the SNR at the output of the microphone array?
The signal to noise ratio (SNR) at the output of the microphone array is therefore given by:SNRout = σ2s |wHa(k0)|2wHΣuw . (30)If the noise is spatially-white, i.e. Σu = σ2uI, then:SNRout = σ2s σ2u |wHa(k0)|2 wHw = SNRin |wHa(k0)|2 wHw (31)with SNRin = σ2s σ2u.
Q11. What is the popular model for channel-wise filtering?
This model is popular for channel-wise filtering in the context of CASA, where the ILD and ITD are called interaural level and intensity differences, respectively, and are influenced by the shape of the pinna, the head and the torso [36].
Q12. What is the simplest way to define the spatial covariance of a diffuse sound field?
Under certain assumptions, the mean value R̄j(f) of this distribution can be defined asR̄j(f) = dj(f)d H j (f) + σ 2 revΩ(f) (21)where dj(f) is the steering vector in (7), Ω(f) is the covariance matrix of a diffuse sound field whose entries Ωii′(νf ) are given in (5), and σ2rev is the power of early echoes and reverberation [113].
Q13. What is the common approach to consider the AIRs as finite impulse response filters?
The simplest approach is to consider the AIRs as finite impulse response (FIR) filters modeled by their time-domain coefficients aj(t, τ) or aj(τ), τ ∈ {0, . . . , L−1}.
Q14. How can a beamformer be computed in closed form?
it cannot even be computed in closed-form: parameter estimation and beamforming are tightly coupled as illustrated by the dashed arrow in Fig.