Audio–Visual Particle Flow SMC-PHD Filtering for Multi-Speaker Tracking

doi:10.1109/TMM.2019.2937185

Home
/
Papers
/
Audio–Visual Particle Flow SMC-PHD Filtering for Multi-Speaker Tracking

Journal Article•DOI•

Audio–Visual Particle Flow SMC-PHD Filtering for Multi-Speaker Tracking

Yang Liu¹, Volkan Kilic², Jian Guan³, Wenwu Wang¹•Institutions (3)

University of Surrey¹, Izmir Kâtip Çelebi University², Harbin Engineering University³

01 Apr 2020-IEEE Transactions on Multimedia (IEEE)-Vol. 22, Iss: 4, pp 934-948

TL;DR: This work proposes a new framework where particle flow (PF) is used to migrate particles smoothly from the prior to the posterior probability density, and developed two new algorithms, AV-ZPF-SMC-PHD and AV-NPF-S MC-P HD, where the speaker states from the previous frames are also considered for particle relocation.

read less

Abstract: Sequential Monte Carlo probability hypothesis density (SMC-PHD) filtering is a popular method used recently for audio-visual (AV) multi-speaker tracking. However, due to the weight degeneracy problem, the posterior distribution can be represented poorly by the estimated probability, when only a few particles are present around the peak of the likelihood density function. To address this issue, we propose a new framework where particle flow (PF) is used to migrate particles smoothly from the prior to the posterior probability density. We consider both zero and non-zero diffusion particle flows (ZPF/NPF), and developed two new algorithms, AV-ZPF-SMC-PHD and AV-NPF-SMC-PHD, where the speaker states from the previous frames are also considered for particle relocation. The proposed algorithms are compared systematically with several baseline tracking methods using the AV16.3, AVDIAR and CLEAR datasets, and are shown to offer improved tracking accuracy and average effective sample size (ESS).

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

End-To-End Semi-supervised Learning for Differentiable Particle Filters

[...]

Hao Wen¹, Xiongjie Chen¹, Georgios Papagiannis¹, Conghui Hu¹, Yunpeng Li¹ - Show less +1 more•Institutions (1)

University of Surrey¹

11 Nov 2020-arXiv: Learning

TL;DR: An end-to-end learning objective is presented based upon the maximisation of a pseudo-likelihood function which can improve the estimation of states when large portion of true states are unknown and is assessed in state estimation tasks in robotics with simulated and real-world datasets.

...read moreread less

Abstract: Recent advances in incorporating neural networks into particle filters provide the desired flexibility to apply particle filters in large-scale real-world applications. The dynamic and measurement models in this framework are learnable through the differentiable implementation of particle filters. Past efforts in optimising such models often require the knowledge of true states which can be expensive to obtain or even unavailable in practice. In this paper, in order to reduce the demand for annotated data, we present an end-to-end learning objective based upon the maximisation of a pseudo-likelihood function which can improve the estimation of states when large portion of true states are unknown. We assess performance of the proposed method in state estimation tasks in robotics with simulated and real-world datasets.

...read moreread less

7 citations

Cites background from "Audio–Visual Particle Flow SMC-PHD ..."

...Sequential state estimation task, which involves estimating unknown state from a sequence of observations, finds a variety of applications including target tracking [1], [2], navigation [3], [4], and signal processing [5], [6]....
[...]

Dataset•

Data and Codes for reproducing the results in "Mean-Shift and Sparse Sampling Based SMC-PHD Filtering for Audio Informed Visual Speaker Tracking"

[...]

W Wang, Kilic

05 Aug 2016

TL;DR: The audio data is proposed to be used to improve the visual SMC-PHD (V-SMC-P HD) filter by using the direction of arrival angles of the audio sources to determine when to propagate the born particles and reallocate the surviving and spawned particles.

...read moreread less

Abstract: The probability hypothesis density (PHD) filter based on sequential Monte Carlo (SMC) approximation (also known as SMC-PHD filter) has proven to be a promising algorithm for multispeaker tracking. However, it has a heavy computational cost as surviving, spawned, and born particles need to be distributed in each frame to model the state of the speakers and to estimate jointly the variable number of speakers with their states. In particular, the computational cost is mostly caused by the born particles as they need to be propagated over the entire image in every frame to detect the new speaker presence in the view of the visual tracker. In this paper, we propose to use the audio data to improve the visual SMC-PHD (V-SMC-PHD) filter by using the direction of arrival angles of the audio sources to determine when to propagate the born particles and reallocate the surviving and spawned particles. The tracking accuracy of the audio-visual SMC-PHD (AV-SMC-PHD) algorithm is further improved by using a modified mean-shift algorithm to search and climb density gradients iteratively to find the peak of the probability distribution, and the extra computational complexity introduced by mean-shift is controlled with a sparse sampling technique. These improved algorithms, named as AVMS-SMC-PHD and sparse-AVMS-SMC-PHD, respectively, are compared systematically with AV-SMC-PHD and V-SMC-PHD based on the AV16.3, AMI, and CLEAR datasets.

...read moreread less

5 citations

Proceedings Article•DOI•

End-to-End Semi-supervised Learning for Differentiable Particle Filters

[...]

Hao Wen¹, Xiongjie Chen¹, Georgios Papagiannis¹, Conghui Hu¹, Yunpeng Li¹ - Show less +1 more•Institutions (1)

University of Surrey¹

30 May 2021

TL;DR: In this paper, an end-to-end learning objective based on the maximisation of a pseudo-likelihood function is proposed to improve the estimation of states when large portion of true states are unknown.

...read moreread less

4 citations

Journal Article•DOI•

Audio-Visual Event Localization by Learning Spatial and Semantic Co-Attention

[...]

01 Jan 2023-IEEE Transactions on Multimedia

TL;DR: In this paper , a co-attention model is proposed to exploit the spatial and semantic correlations between the audio and visual features, which helps guide the extraction of discriminative features for better event localization.

...read moreread less

Abstract: This work aims to temporally localize events that are both audible and visible in video. Previous methods mainly focused on temporal modeling of events with simple fusion of audio and visual features. In natural scenes, a video records not only the events of interest but also ambient acoustic noise and visual background, resulting in redundant information in the raw audio and visual features. Thus, direct fusion of the two features often causes false localization of the events. In this paper, we propose a co-attention model to exploit the spatial and semantic correlations between the audio and visual features, which helps guide the extraction of discriminative features for better event localization. Our assumption is that in an audio-visual event, shared semantic information between audio and visual features exists and can be extracted by attention learning. Specifically, the proposed co-attention model is composed of a co-spatial attention module and a co-semantic attention module that are used to model the spatial and semantic correlations, respectively. The proposed co-attention model can be applied to various event localization tasks, such as cross-modality localization and multimodal event localization. Experiments on the public audio-visual event (AVE) dataset demonstrate that the proposed method achieves state-of-the-art performance by learning spatial and semantic co-attention.

...read moreread less

4 citations

Journal Article•DOI•

Joint detection and tracking of non-ellipsoidal extended targets based on cubature Kalman-CBMeMBer sub-random matrices filter

[...]

Mohamed Barbary, Mohamed H. Abd ElAzeem¹•Institutions (1)

Arab Academy for Science, Technology & Maritime Transport¹

01 Dec 2020-Iet Image Processing

TL;DR: In this article, the authors proposed a new approach for the ESTs tracking under the non-linear Gaussian system based on track-before-detect (TBD) approach, which is more accurate and more principled in mathematical terms compared to SMC-CBMeMBer filter.

...read moreread less

Abstract: Joint detection and tracking of multiple extended targets (ETs) from image observations is a challenging radar technology; especially for extended stealth targets (ESTs). This work provides a new approach for the ESTs tracking under the non-linear Gaussian system based on track-before-detect (TBD) approach. The sequential Monte Carlo cardinality-balanced multi-target multi-Bernoulli (SMC-CBMeMBer) filter provides a good framework to cope with TBD approach. However, this filter suffers from the particles’ degradation problem seriously; especially for ETs tracking. Recently, the cubature Kalman (CK)-CBMeMBer filter which employs a third-degree spherical-radical cubature rule has been proposed to handle the non-linear models, the CK-CBMeMBer filter is more accurate and more principled in mathematical terms compared to SMC-CBMeMBer filter. To this point, the authors address a TBD of ESTs with extended CK-CBMeMBer filter based on random matrix model (RMM), which is an efficient way to track ellipsoidal ESTs. In RMM-ESTs scenarios, although the extension ellipsoid is efficient, it may not be accurate enough because of lacking useful information, such as size, shape, and orientation. Therefore, they introduce a filter composed of sub-ellipses; each one is represented by a RMM. The results confirm the effectiveness and robustness of the proposed filter.

...read moreread less

3 citations

References

PDF

Open Access

More filters

Journal Article•DOI•

A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking

[...]

M.S. Arulampalam¹, Simon Maskell², Neil Gordon², T. Clapp•Institutions (2)

Defence Science and Technology Organization¹, University of Cambridge²

01 Feb 2002-IEEE Transactions on Signal Processing

TL;DR: Both optimal and suboptimal Bayesian algorithms for nonlinear/non-Gaussian tracking problems, with a focus on particle filters are reviewed.

...read moreread less

Abstract: Increasingly, for many application areas, it is becoming important to include elements of nonlinearity and non-Gaussianity in order to model accurately the underlying dynamics of a physical system. Moreover, it is typically crucial to process data on-line as it arrives, both from the point of view of storage costs as well as for rapid adaptation to changing signal characteristics. In this paper, we review both optimal and suboptimal Bayesian algorithms for nonlinear/non-Gaussian tracking problems, with a focus on particle filters. Particle filters are sequential Monte Carlo methods based on point mass (or "particle") representations of probability densities, which can be applied to any state-space model and which generalize the traditional Kalman filtering methods. Several variants of the particle filter such as SIR, ASIR, and RPF are introduced within a generic framework of the sequential importance sampling (SIS) algorithm. These are discussed and compared with the standard EKF through an illustrative example.

...read moreread less

11,409 citations

Additional excerpts

...(20) where the proposal distribution qk(mk|k−1|m̃jk|k−1) ∝ N (m̃jk|k−1,Σ(2)q), Σq is the covariance of the proposal distribution [66], [67], and det is a determinant....
[...]

Proceedings Article•DOI•

k-means++: the advantages of careful seeding

[...]

David Arthur¹, Sergei Vassilvitskii¹•Institutions (1)

Stanford University¹

07 Jan 2007

TL;DR: By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.

...read moreread less

Abstract: The k-means method is a widely used clustering technique that seeks to minimize the average squared distance between points in the same cluster. Although it offers no accuracy guarantees, its simplicity and speed are very appealing in practice. By augmenting k-means with a very simple, randomized seeding technique, we obtain an algorithm that is Θ(logk)-competitive with the optimal clustering. Preliminary experiments show that our augmentation improves both the speed and the accuracy of k-means, often quite dramatically.

...read moreread less

7,539 citations

Journal Article•DOI•

On sequential Monte Carlo sampling methods for Bayesian filtering

[...]

Arnaud Doucet¹, Simon J. Godsill¹, Christophe Andrieu¹•Institutions (1)

University of Cambridge¹

01 Jul 2000-Statistics and Computing

TL;DR: An overview of methods for sequential simulation from posterior distributions for discrete time dynamic models that are typically nonlinear and non-Gaussian, and how to incorporate local linearisation methods similar to those which have previously been employed in the deterministic filtering literature are shown.

...read moreread less

Abstract: In this article, we present an overview of methods for sequential simulation from posterior distributions. These methods are of particular interest in Bayesian filtering for discrete time dynamic models that are typically nonlinear and non-Gaussian. A general importance sampling framework is developed that unifies many of the methods which have been proposed over the last few decades in several different scientific disciplines. Novel extensions to the existing methods are also proposed. We show in particular how to incorporate local linearisation methods similar to those which have previously been employed in the deterministic filtering literatures these lead to very effective importance distributions. Furthermore we describe a method which uses Rao-Blackwellisation in order to take advantage of the analytic structure present in some important classes of state-space models. In a final section we develop algorithms for prediction, smoothing and evaluation of the likelihood in dynamic models.

...read moreread less

4,810 citations

Additional excerpts

...(20) where the proposal distribution qk(mk|k−1|m̃jk|k−1) ∝ N (m̃jk|k−1,Σ(2)q), Σq is the covariance of the proposal distribution [66], [67], and det is a determinant....
[...]

Book•DOI•

An Introduction to the Kalman Filter

[...]

Greg Welch¹, Gary Bishop¹•Institutions (1)

University of North Carolina at Chapel Hill¹

29 Nov 1995

TL;DR: The discrete Kalman filter as mentioned in this paper is a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error.

...read moreread less

Abstract: In 1960, R.E. Kalman published his famous paper describing a recursive solution to the discrete-data linear filtering problem. Since that time, due in large part to advances in digital computing, the Kalman filter has been the subject of extensive research and application, particularly in the area of autonomous or assisted navigation. The Kalman filter is a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error. The filter is very powerful in several aspects: it supports estimations of past, present, and even future states, and it can do so even when the precise nature of the modeled system is unknown. The purpose of this paper is to provide a practical introduction to the discrete Kalman filter. This introduction includes a description and some discussion of the basic discrete Kalman filter, a derivation, description and some discussion of the extended Kalman filter, and a relatively simple (tangible) example with real numbers & results.

...read moreread less

2,811 citations

Journal Article•DOI•

Filtering via Simulation: Auxiliary Particle Filters

[...]

Michael K. Pitt¹, Neil Shephard²•Institutions (2)

Imperial College London¹, Nuffield College²

01 Jun 1999-Journal of the American Statistical Association

TL;DR: This article analyses the recently suggested particle approach to filtering time series and suggests that the algorithm is not robust to outliers for two reasons: the design of the simulators and the use of the discrete support to represent the sequentially updating prior distribution.

...read moreread less

Abstract: This article analyses the recently suggested particle approach to filtering time series. We suggest that the algorithm is not robust to outliers for two reasons: the design of the simulators and the use of the discrete support to represent the sequentially updating prior distribution. Here we tackle the first of these problems.

...read moreread less

2,608 citations