A probabilistic multimodal generation model is introduced and used to derive an information theoretic measure of cross-modal correspondence and nonparametric statistical density modeling techniques can characterize the mutual information between signals from different domains.
Abstract:
Audio and visual signals arriving from a common source are detected using a signal-level fusion technique. A probabilistic multimodal generation model is introduced and used to derive an information theoretic measure of cross-modal correspondence. Nonparametric statistical density modeling techniques can characterize the mutual information between signals from different domains. By comparing the mutual information between different pairs of signals, it is possible to identify which person is speaking a given utterance and discount errant motion or audio from other utterances or nonspeech events.
TL;DR: An incremental correlation evaluation with mutual information is developed here, which significantly reduces the computational cost and is incorporated into a segmentation technique to localize a sound source region in the first visual frame of the current time window.
TL;DR: The goal is to study the two modes in a correlated way and to explore their properties in a collaborative and robust way, in order to produce a reliable result as independent as possible from any a priori knowledge.
TL;DR: This thesis shows how informative features can be extracted from the visual modality, using an information-theoretic framework which gives us a quantitative measure of the relevance of individual features, and proves that reducing redundancy between these features is important for avoiding the curse of dimensionality and improving recognition results.
TL;DR: A method that exploits the information theoretic framework described in [1] to extract optimal audio features with respect to the video features and allows to detect the active speaker among dieren t candidates.
TL;DR: A novel, text-dependent scheme for checking audiovisual synchronization in a video sequence is presented and custom visual features learned using a unique deep learning framework are presented and show that they outperform other commonly used visual features.
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
TL;DR: In this paper, the problem of the estimation of a probability density function and of determining the mode of the probability function is discussed. Only estimates which are consistent and asymptotically normal are constructed.
TL;DR: The synthesis of a new category of spatial filters that produces sharp output correlation peaks with controlled peak values is considered, and these filters are referred to as minimum average correlation energy filters.
Q1. What have the authors contributed in "Speaker association with signal-level audiovisual fusion" ?
In this paper, a probabilistic multimodal generation model is introduced and used to derive an information theoretic measure of cross-modal correspondence.
Q2. What is the criterion for a prewhitening filter?
Computing can be decomposed into three stages:1) Prewhiten the images once (using the average spectrum of the images) followed by iterations of 2) Updating the feature values ( ’s) using (14), and 3) Solving for the projection coefficients using least squaresand the penalty.
Q3. How can nonparametric statistical density models be used to represent complex joint densities of projected?
Nonparametric statistical density models can be used to represent complex joint densities of projected signals, and to successfully estimate mutual information.
Q4. How can the authors learn the relationship between audio and video?
Using principles from information theory and nonparametric statistics the authors show how an approach for learning maximally informative joint subspaces can find cross-modal correspondences.
Q5. What is the adaptation criterion for the projections?
The adaptation criterion, which the authors maximize in practice, is then a combination of the approximation to MI (11) and the regularization terms:(17)where the last term derives from the output energy constraint and is average autocorrelation function (taken over all images in the sequence).
Q6. What is the way to estimate the mutual information of continuous random variables?
Mutual information for continuous random variables can be expressed in several ways as a combination of differential entropy terms [14](10)Mutual information indicates the amount of information that one random variable conveys on average about another.