scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Audio–Visual Particle Flow SMC-PHD Filtering for Multi-Speaker Tracking

TL;DR: This work proposes a new framework where particle flow (PF) is used to migrate particles smoothly from the prior to the posterior probability density, and developed two new algorithms, AV-ZPF-SMC-PHD and AV-NPF-S MC-P HD, where the speaker states from the previous frames are also considered for particle relocation.
Abstract: Sequential Monte Carlo probability hypothesis density (SMC-PHD) filtering is a popular method used recently for audio-visual (AV) multi-speaker tracking. However, due to the weight degeneracy problem, the posterior distribution can be represented poorly by the estimated probability, when only a few particles are present around the peak of the likelihood density function. To address this issue, we propose a new framework where particle flow (PF) is used to migrate particles smoothly from the prior to the posterior probability density. We consider both zero and non-zero diffusion particle flows (ZPF/NPF), and developed two new algorithms, AV-ZPF-SMC-PHD and AV-NPF-SMC-PHD, where the speaker states from the previous frames are also considered for particle relocation. The proposed algorithms are compared systematically with several baseline tracking methods using the AV16.3, AVDIAR and CLEAR datasets, and are shown to offer improved tracking accuracy and average effective sample size (ESS).
Citations
More filters
Posted Content
Hao Wen1, Xiongjie Chen1, Georgios Papagiannis1, Conghui Hu1, Yunpeng Li1 
TL;DR: An end-to-end learning objective is presented based upon the maximisation of a pseudo-likelihood function which can improve the estimation of states when large portion of true states are unknown and is assessed in state estimation tasks in robotics with simulated and real-world datasets.
Abstract: Recent advances in incorporating neural networks into particle filters provide the desired flexibility to apply particle filters in large-scale real-world applications. The dynamic and measurement models in this framework are learnable through the differentiable implementation of particle filters. Past efforts in optimising such models often require the knowledge of true states which can be expensive to obtain or even unavailable in practice. In this paper, in order to reduce the demand for annotated data, we present an end-to-end learning objective based upon the maximisation of a pseudo-likelihood function which can improve the estimation of states when large portion of true states are unknown. We assess performance of the proposed method in state estimation tasks in robotics with simulated and real-world datasets.

7 citations


Cites background from "Audio–Visual Particle Flow SMC-PHD ..."

  • ...Sequential state estimation task, which involves estimating unknown state from a sequence of observations, finds a variety of applications including target tracking [1], [2], navigation [3], [4], and signal processing [5], [6]....

    [...]

Dataset
05 Aug 2016
TL;DR: The audio data is proposed to be used to improve the visual SMC-PHD (V-SMC-P HD) filter by using the direction of arrival angles of the audio sources to determine when to propagate the born particles and reallocate the surviving and spawned particles.
Abstract: The probability hypothesis density (PHD) filter based on sequential Monte Carlo (SMC) approximation (also known as SMC-PHD filter) has proven to be a promising algorithm for multispeaker tracking. However, it has a heavy computational cost as surviving, spawned, and born particles need to be distributed in each frame to model the state of the speakers and to estimate jointly the variable number of speakers with their states. In particular, the computational cost is mostly caused by the born particles as they need to be propagated over the entire image in every frame to detect the new speaker presence in the view of the visual tracker. In this paper, we propose to use the audio data to improve the visual SMC-PHD (V-SMC-PHD) filter by using the direction of arrival angles of the audio sources to determine when to propagate the born particles and reallocate the surviving and spawned particles. The tracking accuracy of the audio-visual SMC-PHD (AV-SMC-PHD) algorithm is further improved by using a modified mean-shift algorithm to search and climb density gradients iteratively to find the peak of the probability distribution, and the extra computational complexity introduced by mean-shift is controlled with a sparse sampling technique. These improved algorithms, named as AVMS-SMC-PHD and sparse-AVMS-SMC-PHD, respectively, are compared systematically with AV-SMC-PHD and V-SMC-PHD based on the AV16.3, AMI, and CLEAR datasets.

5 citations

Proceedings ArticleDOI
Hao Wen1, Xiongjie Chen1, Georgios Papagiannis1, Conghui Hu1, Yunpeng Li1 
30 May 2021
TL;DR: In this paper, an end-to-end learning objective based on the maximisation of a pseudo-likelihood function is proposed to improve the estimation of states when large portion of true states are unknown.
Abstract: Recent advances in incorporating neural networks into particle filters provide the desired flexibility to apply particle filters in large-scale real-world applications. The dynamic and measurement models in this framework are learnable through the differentiable implementation of particle filters. Past efforts in optimising such models often require the knowledge of true states which can be expensive to obtain or even unavailable in practice. In this paper, in order to reduce the demand for annotated data, we present an end-to-end learning objective based upon the maximisation of a pseudo-likelihood function which can improve the estimation of states when large portion of true states are unknown. We assess performance of the proposed method in state estimation tasks in robotics with simulated and real-world datasets.

4 citations

Journal ArticleDOI
TL;DR: In this paper , a co-attention model is proposed to exploit the spatial and semantic correlations between the audio and visual features, which helps guide the extraction of discriminative features for better event localization.
Abstract: This work aims to temporally localize events that are both audible and visible in video. Previous methods mainly focused on temporal modeling of events with simple fusion of audio and visual features. In natural scenes, a video records not only the events of interest but also ambient acoustic noise and visual background, resulting in redundant information in the raw audio and visual features. Thus, direct fusion of the two features often causes false localization of the events. In this paper, we propose a co-attention model to exploit the spatial and semantic correlations between the audio and visual features, which helps guide the extraction of discriminative features for better event localization. Our assumption is that in an audio-visual event, shared semantic information between audio and visual features exists and can be extracted by attention learning. Specifically, the proposed co-attention model is composed of a co-spatial attention module and a co-semantic attention module that are used to model the spatial and semantic correlations, respectively. The proposed co-attention model can be applied to various event localization tasks, such as cross-modality localization and multimodal event localization. Experiments on the public audio-visual event (AVE) dataset demonstrate that the proposed method achieves state-of-the-art performance by learning spatial and semantic co-attention.

4 citations

Journal ArticleDOI
TL;DR: In this article, the authors proposed a new approach for the ESTs tracking under the non-linear Gaussian system based on track-before-detect (TBD) approach, which is more accurate and more principled in mathematical terms compared to SMC-CBMeMBer filter.
Abstract: Joint detection and tracking of multiple extended targets (ETs) from image observations is a challenging radar technology; especially for extended stealth targets (ESTs). This work provides a new approach for the ESTs tracking under the non-linear Gaussian system based on track-before-detect (TBD) approach. The sequential Monte Carlo cardinality-balanced multi-target multi-Bernoulli (SMC-CBMeMBer) filter provides a good framework to cope with TBD approach. However, this filter suffers from the particles’ degradation problem seriously; especially for ETs tracking. Recently, the cubature Kalman (CK)-CBMeMBer filter which employs a third-degree spherical-radical cubature rule has been proposed to handle the non-linear models, the CK-CBMeMBer filter is more accurate and more principled in mathematical terms compared to SMC-CBMeMBer filter. To this point, the authors address a TBD of ESTs with extended CK-CBMeMBer filter based on random matrix model (RMM), which is an efficient way to track ellipsoidal ESTs. In RMM-ESTs scenarios, although the extension ellipsoid is efficient, it may not be accurate enough because of lacking useful information, such as size, shape, and orientation. Therefore, they introduce a filter composed of sub-ellipses; each one is represented by a RMM. The results confirm the effectiveness and robustness of the proposed filter.

3 citations

References
More filters
Journal ArticleDOI
TL;DR: Both optimal and suboptimal Bayesian algorithms for nonlinear/non-Gaussian tracking problems, with a focus on particle filters are reviewed.
Abstract: Increasingly, for many application areas, it is becoming important to include elements of nonlinearity and non-Gaussianity in order to model accurately the underlying dynamics of a physical system. Moreover, it is typically crucial to process data on-line as it arrives, both from the point of view of storage costs as well as for rapid adaptation to changing signal characteristics. In this paper, we review both optimal and suboptimal Bayesian algorithms for nonlinear/non-Gaussian tracking problems, with a focus on particle filters. Particle filters are sequential Monte Carlo methods based on point mass (or "particle") representations of probability densities, which can be applied to any state-space model and which generalize the traditional Kalman filtering methods. Several variants of the particle filter such as SIR, ASIR, and RPF are introduced within a generic framework of the sequential importance sampling (SIS) algorithm. These are discussed and compared with the standard EKF through an illustrative example.

11,409 citations


Additional excerpts

  • ...(20) where the proposal distribution qk(mk|k−1|m̃jk|k−1) ∝ N (m̃jk|k−1,Σ(2)q), Σq is the covariance of the proposal distribution [66], [67], and det is a determinant....

    [...]

Proceedings ArticleDOI
07 Jan 2007
TL;DR: By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.
Abstract: The k-means method is a widely used clustering technique that seeks to minimize the average squared distance between points in the same cluster. Although it offers no accuracy guarantees, its simplicity and speed are very appealing in practice. By augmenting k-means with a very simple, randomized seeding technique, we obtain an algorithm that is Θ(logk)-competitive with the optimal clustering. Preliminary experiments show that our augmentation improves both the speed and the accuracy of k-means, often quite dramatically.

7,539 citations

Journal ArticleDOI
TL;DR: An overview of methods for sequential simulation from posterior distributions for discrete time dynamic models that are typically nonlinear and non-Gaussian, and how to incorporate local linearisation methods similar to those which have previously been employed in the deterministic filtering literature are shown.
Abstract: In this article, we present an overview of methods for sequential simulation from posterior distributions. These methods are of particular interest in Bayesian filtering for discrete time dynamic models that are typically nonlinear and non-Gaussian. A general importance sampling framework is developed that unifies many of the methods which have been proposed over the last few decades in several different scientific disciplines. Novel extensions to the existing methods are also proposed. We show in particular how to incorporate local linearisation methods similar to those which have previously been employed in the deterministic filtering literatures these lead to very effective importance distributions. Furthermore we describe a method which uses Rao-Blackwellisation in order to take advantage of the analytic structure present in some important classes of state-space models. In a final section we develop algorithms for prediction, smoothing and evaluation of the likelihood in dynamic models.

4,810 citations


Additional excerpts

  • ...(20) where the proposal distribution qk(mk|k−1|m̃jk|k−1) ∝ N (m̃jk|k−1,Σ(2)q), Σq is the covariance of the proposal distribution [66], [67], and det is a determinant....

    [...]

BookDOI
29 Nov 1995
TL;DR: The discrete Kalman filter as mentioned in this paper is a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error.
Abstract: In 1960, R.E. Kalman published his famous paper describing a recursive solution to the discrete-data linear filtering problem. Since that time, due in large part to advances in digital computing, the Kalman filter has been the subject of extensive research and application, particularly in the area of autonomous or assisted navigation. The Kalman filter is a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error. The filter is very powerful in several aspects: it supports estimations of past, present, and even future states, and it can do so even when the precise nature of the modeled system is unknown. The purpose of this paper is to provide a practical introduction to the discrete Kalman filter. This introduction includes a description and some discussion of the basic discrete Kalman filter, a derivation, description and some discussion of the extended Kalman filter, and a relatively simple (tangible) example with real numbers & results.

2,811 citations

Journal ArticleDOI
TL;DR: This article analyses the recently suggested particle approach to filtering time series and suggests that the algorithm is not robust to outliers for two reasons: the design of the simulators and the use of the discrete support to represent the sequentially updating prior distribution.
Abstract: This article analyses the recently suggested particle approach to filtering time series. We suggest that the algorithm is not robust to outliers for two reasons: the design of the simulators and the use of the discrete support to represent the sequentially updating prior distribution. Here we tackle the first of these problems.

2,608 citations


"Audio–Visual Particle Flow SMC-PHD ..." refers methods in this paper

  • ...the auxiliary particle filter [21], unscented particle filter [22], auxiliary SMC-PHD filter [23] and...

    [...]