A probabilistic multimodal generation model is introduced and used to derive an information theoretic measure of cross-modal correspondence and nonparametric statistical density modeling techniques can characterize the mutual information between signals from different domains.
Abstract:
Audio and visual signals arriving from a common source are detected using a signal-level fusion technique. A probabilistic multimodal generation model is introduced and used to derive an information theoretic measure of cross-modal correspondence. Nonparametric statistical density modeling techniques can characterize the mutual information between signals from different domains. By comparing the mutual information between different pairs of signals, it is possible to identify which person is speaking a given utterance and discount errant motion or audio from other utterances or nonspeech events.
TL;DR: A method that exploits an information theoretic framework to extract optimized audio features using video information and achieves a speaker detection rate of 100% on in-house test sequences, and of 85% on most commonly used sequences.
TL;DR: The proposed algorithm uses unsupervised learning to form dictionaries of bimodal kernels from audio-visual material to robustly localize a speaker even in the presence of severe acoustic and visual distracters.
TL;DR: Three new methods for asynchrony detection based on co-inertia analysis (CoIA) and a fourth based on coupled hidden Markov models (CHMMs) are derived.
TL;DR: This letter forms the problem as a likelihood maximization task and derives the associated conjugate expectation-maximization algorithm, which is tested and evaluated within the task of 3D localization of several speakers using both auditory and visual data.
TL;DR: This paper addresses the problem of automatically detecting whether the audio and visual speech modalities in frontal pose videos are synchronous or not, and investigates the use of deep neural networks (DNNs) for this purpose.
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
TL;DR: In this paper, the problem of the estimation of a probability density function and of determining the mode of the probability function is discussed. Only estimates which are consistent and asymptotically normal are constructed.
TL;DR: The synthesis of a new category of spatial filters that produces sharp output correlation peaks with controlled peak values is considered, and these filters are referred to as minimum average correlation energy filters.
Q1. What have the authors contributed in "Speaker association with signal-level audiovisual fusion" ?
In this paper, a probabilistic multimodal generation model is introduced and used to derive an information theoretic measure of cross-modal correspondence.
Q2. What is the criterion for a prewhitening filter?
Computing can be decomposed into three stages:1) Prewhiten the images once (using the average spectrum of the images) followed by iterations of 2) Updating the feature values ( ’s) using (14), and 3) Solving for the projection coefficients using least squaresand the penalty.
Q3. How can nonparametric statistical density models be used to represent complex joint densities of projected?
Nonparametric statistical density models can be used to represent complex joint densities of projected signals, and to successfully estimate mutual information.
Q4. How can the authors learn the relationship between audio and video?
Using principles from information theory and nonparametric statistics the authors show how an approach for learning maximally informative joint subspaces can find cross-modal correspondences.
Q5. What is the adaptation criterion for the projections?
The adaptation criterion, which the authors maximize in practice, is then a combination of the approximation to MI (11) and the regularization terms:(17)where the last term derives from the output energy constraint and is average autocorrelation function (taken over all images in the sequence).
Q6. What is the way to estimate the mutual information of continuous random variables?
Mutual information for continuous random variables can be expressed in several ways as a combination of differential entropy terms [14](10)Mutual information indicates the amount of information that one random variable conveys on average about another.