Sequential Monte Carlo fusion of sound and vision for speaker tracking
read more
Citations
Multimodal fusion for multimedia analysis: a survey
Data fusion for visual tracking with particles
Apparatus and method performing audio-video sensor fusion for object localization, tracking, and separation
Pixels that sound
Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings
References
Novel approach to nonlinear/non-Gaussian Bayesian state estimation
The generalized correlation method for estimation of time delay
Related Papers (5)
Frequently Asked Questions (14)
Q2. What is the way to separate the sound likelihood from the image?
Since the sound likelihood depends only on D, which in turn depends on the configuration X only via the image x coordinate, the sampling is separated in two stages by “partitioned sampling” [11].
Q3. What is the importance of g for inter-frame stability?
It is important for inter-frame stability that g is defined relative to global image statistics, rather than local statistics gathered along one normal, or from the normals of one outline curve (which would be economical, computationally).
Q4. What is the particle filtering technique?
Since the broadest variations in X are due to x and y, separating x and y leads to a considerable improvement in sampling efficiency, palpable as a reduction in the number of particles needed per time step.
Q5. What is the way to measure acoustic noise?
In acoustic environments with relatively low noise and reverberation, triangulation based on the TDOA of measurements at a microphone pair [5, 12] is effective.
Q6. What is the simplest way to estimate the configuration of a head?
The general tracking problem involves the recursive estimation of the filtering distribution p(Xkjz1:k), with z = (zA; zI ) and the subscript 1 : k denoting all the observations from time 1 to time k, from which estimates of the configuration X can be obtained.
Q7. What is the result of the first experiment on the “motion” sequence?
The particle filter successfully tracks the subject during the period of normal motion to the left, but loses track during the rapid motion to the right.
Q8. What is the purpose of this paper?
The authors establish design principles and demonstrate a working system that fuses stereophonic sound localisation with active contour tracking.
Q9. What is the way to measure the correlation between sound and audio?
So far studies have been based on stored audiovisual sequences, but preliminary indications (based on software profiling) suggest that a real-time system should be quite feasible without special hardware, and work is currently in progress to achieve this.
Q10. How much of the field of view is fixed?
In experiments, the authors fixed (x) = (y) = 10s 1 and v(x) and v(y) to 10% and 5% of the field of view in the respective directions, per second.
Q11. What is the recursive distribution of the xkj?
The general recursions to compute the filtering distribution are given byp (Xkj z1:k 1) = Z p (XkjXk 1) p (dXk 1j z1:k 1)p (Xkj z1:k) / L (zA;kjDk)L (zI;kjCk) p (Xkj z1:k 1) ;where the first, or prediction, step uses the dynamical model and the filtering distribution at the previous time step to compute the one-step ahead prediction distribution, which then acts as the prior for the configuration in the second,or update, step where it is combined with the likelihood to obtain the filtering distribution.
Q12. What was the result of the first experiment on the “motion” sequence?
The system was only very roughly calibrated, and proved to be robust to the exact values chosen for the intrinsic parameters of the camera, and did not require extremely careful placement of the microphones relative to the camera.
Q13. What is the particle filter architecture used in this experiment?
The particular particle filter architecture adopted here deviates from the standard particle filter, and makes the best use of the properties of the model.
Q14. What is the probability of all measurements being due to clutter?
In (2), q0 is the prior probability of all measurements being due to clutter, qi, i = 1; : : : ; N , is the prior probability ofthe i-th measurement corresponding to the true TDOA, c is a normalising constant, and ID ( ) is the indicator function for the set D = [ Dmax; Dmax].