scispace - formally typeset

Proceedings ArticleDOI

Cross-modal localization through mutual information

10 Oct 2009-pp 5597-5602

TL;DR: This paper presents an approach to localise this embedded common information through an indirect method of estimating mutual information between all signal sources, which leads to faster convergence with much reduced spurious associations.
Abstract: Relating information originating from disparate sensors observing a given scene is a challenging task, particularly when an appropriate model of the environment or the behaviour of any particular object within it is not available. One possible strategy to address this task is to examine whether the sensor outputs contain information which can be attributed to a common cause. In this paper, we present an approach to localise this embedded common information through an indirect method of estimating mutual information between all signal sources. Ability of L 1 regularization to enforce sparseness of the solution is exploited to identify a subset of signals that are related to each other, from among a large number of sensor outputs. As opposed to the conventional L 2 regularization, the proposed method leads to faster convergence with much reduced spurious associations. Simulation and experimental results are presented to validate the findings.

Content maybe subject to copyright    Report

© [2009] IEEE. Reprinted, with permission, from [Alempijevic, A.; Kodagoda,
S.; Dissanayake, G. Cross-modal localization through mutual information. Intelligent
Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference]. This
material is posted here with permission of the IEEE. Such permission of the IEEE
does not in any way imply IEEE endorsement of any of the University of Technology,
Sydney's products or services. Internal or personal use of this material is permitted.
However, permission to reprint/republish this material for advertising or promotional
purposes or for creating new collective works for resale or redistribution must be
obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to
view this document, you agree to all provisions of the copyright laws protecting it

Cross-Modal Localization Through Mutual Information
Alen Alempijevic, Sarath Kodagoda and Gamini Dissanayake
AbstractRelating information originating from disparate
sensors observing a given scene is a challenging task, partic-
ularly when an appropriate model of the environment or the
behaviour of any particular object within it is not available.
One possible strategy to address this task is to examine whether
the sensor outputs contain information which can be attributed
to a common cause. In this paper, we present an approach to
localise this embedded common information through an indirect
method of estimating mutual information between all signal
sources. Ability of L
1
regularization to enforce sparseness of
the solution is exploited to identify a subset of signals that are
related to each other, from among a large number of sensor
outputs. As opposed to the conventional L
2
regularization,
the proposed method leads to faster convergence with much
reduced spurious associations. Simulation and experimental
results are presented to validate the findings.
I. INTRODUCTION
The world market for sensors and wireless communication
technologies is ever growing, prompting the rapid deploy-
ment of wireless sensor networks [1]. Therefore, it is not
unreasonable to assume that sensors will be omnipresent in
the near future. With the presence of large number of sensors
and signals, there is a growing interest in cross-modal signal
analysis. The objective is not necessarily to geometrically
relate the sensors, the emphasis is rather placed on relat-
ing parts of the sensor signals. The following fundamental
concept in perception is exploited extensively in this paper:
motion has in principle, greater power to specify properties
of an object than purely spatial information. Thus, relating
signals could generally be carried out through comparison
of vectors of signals, which have been monitored over time.
One important aspect of such signal processing is to localize
some components of a particular signal to that best correlate
with the other signal, which also originated from the same
source.
This type of analysis is reported in various fields including,
biomedical engineering, climatology, network analysis and
economy. In biomedical research, heart rate fluctuations are
examined against several interacting physiological mecha-
nisms including visual cortex activity, respiratory rate etc
[10] in order to determine the neurological status of infants.
In climatology, dynamic weather patterns in a particular loca-
tion are correlated to synoptic meteorological data gathered
over time [13]. In economy, revenue performance of a market
is correlated with a large set of economic and social criteria
[15].
A. Alempijevic, S.Kodagoda and G. Dissanayake are with ARC
Centre of Excellence for Autonomous Systems (CAS), University of
Technology, Sydney, Australia a.alempijevic, s.kodagoda,
g.dissanayake @cas.edu.au
There a number of techniques that are suitable for detect-
ing the statistical dependence of signals. Techniques such as
Canonical Correlation Analysis and Principle Components
Analysis rely on correlation, a second order statistic. Alter-
native non parametric techniques are Kendall’s tau, Cross
Correlograms, Mutual Information (MI) and Independent
Component Analysis. The selected metric is required to
identify a non-linear higher (than second) order of statis-
tical dependence between signals. The measure of statistical
dependence should be valid without any assumptions of
an underlying probability density function and should be
extendible to high dimensionality of input signals. Mutual
information is identified as the most promising metric, ful-
filling all requirements.
The methods for mutual information (MI) estimation can
be classified into two broad categories, based on whether
mutual information is computed directly or the condition for
maximum MI is obtained indirectly through an optimization
process that does not involve computing MI [2], [7]. The
most natural way of estimating MI via the direct method is
to use a nonparametric density estimator together with the
theoretical expression for entropy. However, the definition
of entropy requires an integration of the underlying PDF
over the set of all possible outcomes. In practice, there is
no closed form solution for this integral. Combining the
nonparametric density estimator with an approximation of
theoretical entropy has been widely described in the literature
to overcome this problem [16]. However, this requires pair
wise comparisons of all permutations of input signals to find
the most informative statistically dependent pairings, which
is not feasible for large number of signals, such as images.
The indirect MI estimation method determines the most
mutually informative signal pairings through mapping of the
signals into a two dimensional space. The key to obtaining
the most informative mapping is in a technique that computes
the effect of the mapping parameters on the information
content in the lower dimensional space. Fisher et. al. [8]
demonstrate a linear mapping of the signals that maximise
MI by defining an objective function that operates on the
resulting two dimensional space.
This paper builds upon Fisher’s work [8] and our previous
research on indirect MI estimation [2] by introducing the
L
1
norm to obtain a sparse linear mapping. L
1
norm has
found extensive use recently in solving convex optimisation
problems from arbitrary signals estimated from incomplete
set of measurements corrupted by noise [5] and also exhibits
a very useful property, which is the preservation of the
sparsity of the relationship between the multidimensional
random variables. The L
1
norm as a penalty function on

the magnitudes of the mapping coefficients is shown to be
suited to the applications examined in this paper where the
mutually informative signals are usually embedded in a large
number of non informative signals.
The remainder of this document is organised as follows,
Section II outlines an indirect estimation algorithm for MI.
Section III describes the process of finding the maximum
MI with L
1
penalty norm and optimization parameters.
Experimental results are presented in Section IV. Section V
concludes the paper providing future research directions.
II. INDIRECT ESTIMATION OF MUTUAL INFORMATION
THROUGH NON-LINEAR MAPPINGS
Mutual information between two random vectors X
1
, X
2
can be defined as follows.
I(X
1
; X
2
) = H(X
1
) + H(X
2
) H(X
1
, X
2
) (1)
Where, H(X
1
) and H(X
2
) are the entropies of X
1
and
X
2
respectively, H(X
1
, X
2
) is the joint entropy term. Direct
estimation of MI requires calculation of entropy terms in (1).
Entropy H(X
1
), also referred to as Shannons entropy of
random variable X
1
with density p(x
1
) is given by,
H(X
1
) =
Z
p(x
1
) log(p(x
1
))dx
1
(2)
where is the set of possible outcomes.
There are two distinctive problems that need addressing
when calculating entropy in this form, firstly calculating the
underlying unknown PDF of the random variable to obtain
p(x
1
) over the entire space , and second, the integration
over the set of all possible outcomes. Both are addressed
through indirect estimation.
Mutual information between two high dimensional signals
X
1
and X
2
can be indirectly estimated by mapping the
signals into a lower dimensional space, by exploiting the
data processing inequality [6] that defines lower bounds on
mutual information. The inequality states
I(g(α
1
, X
1
); g(α
2
, X
2
)) I(X
1
; X
2
) (3)
for any random vectors X
1
and X
2
and any function
g(α, ·) defined on the range of X
1
and X
2
respectively.
The generality of the data processing inequality implies that
there are no constraints on the choice of transformations
g(·). Furthermore, as the functions g(α, ·) map the input data
into a lower dimensional space, computing the information
content I(g(α
1
, X
1
); g(α
2
, X
2
)) is significantly easier.
The mappings Y
1
= g(α
1
, X
1
) and Y
2
= g(α
2
, X
2
)
can be achieved through any differentiable function, such
as hyperbolic tangent [11] or multiple layer perceptrons [8].
However, linear projections are preferred due to the fact that
the linear projection coefficients themselves can be used as
a measure of MI of each individual signal in random vectors
X
1
, X
2
to the resulting lower dimensional Y
1
, Y
2
mutual
information. We now present how to select the parameters of
linear mappings Y
1
= α
1
X
1
and Y
2
= α
2
X
2
, thus, selecting
subset of the most mutual informative signals from sets of
signals X
1
and X
2
without the need to estimate MI on all
permutations of signal sets.
III. OPTIMIZATION OF MAPPINGS VIA
INFORMATION MAXIMISATION PRINCIPLE
Finding the optimal projections α
1
and α
2
would require
solving a complex non-linear optimization problem. It is
generally not feasible to obtain a closed form solution to
this problem without numerical methods such as Powell’s
direction set method [3]. However, the high cost of comput-
ing MI, together with the fact that the parameter vector α is
in the dimension of the input signals in the case of a linear
map makes direct optimization intractable.
An entropy estimation measure proposed by Fisher et.
al. [8] allows for obtaining the gradient of the measure
with respect to the mappings parameters. They proposed an
unsupervised learning method by which the mappings g
1
(·)
and g
2
(·) can be estimated indirectly, without computing
mutual information. The maximisation of MI is achieved by
maximising the entropies H(Y
1
) and H(Y
2
) and minimising
the joint entropy, H(Y
1
, Y
2
) in (1). The entropies H(Y
1
)
and H(Y
2
) can be maximised by selecting the mapping
parameters to make the data on the lower dimensional space
resemble a uniform distribution. Likewise, joint entropy
H(Y
1
, Y
2
) can be minimised by selecting the mapping pa-
rameters to reflect the joint distribution, (Y
1
, Y
2
) is furthest
away from a uniform distribution.
Thus, maximisation of MI can be achieved by maximising
the objective function J,
J = J
Y
1
+ J
Y
2
J
Y
1,2
(4)
where each element of J
Y
1
, J
Y
2
, J
Y
1,2
are of the form,
1
2
Z
f(u)
ˆ
f(y
u
)
2
du (5)
Where indicates the nonzero region over which the
integration is evaluated. Therefore (5) is the integrated square
distance between the output distribution (evaluated by a
parzen density estimator,
ˆ
f(y
u
) at a point u over a set of
observations y) and the desired output distribution f(u).
It can be shown that the gradient of each element of J
with respect to the mappings parameters α can be computed
as follows [8].
J
α
=
J
ˆ
f
ˆ
f
g(α,x)
g(α,x)
α
=
1
N
P
i
ǫ
i
α
g(α, x)
Note that
g(α,x)
α
is a constant as we have assumed g(·)
is a linear projection. The term ǫ
i
is [8],
ǫ
(k)
i
= b
r
(y
(k1)
i
)
1
N
X
j6=i
κ
a
(y
(k1)
i
y
(k1)
j
, Σ) (6)
b
r
(y
i
)
j
1
d
κ
a
(y
i
+
d
2
, Σ)
j
κ
a
(y
i
d
2
, Σ)
j
(7)

κ
a
(y, Σ) = G(y, Σ) G
(y, Σ) (8)
expanding G and G
κ
a
(y, Σ) =
1
2
M+1
π
M/2
Σ
M+2
exp(
y
T
Σ
2
y
4
)y (9)
where, κ
a
(.) is a kernel: a Gaussian PDF with standard
deviation of Σ = σ
2
I is assumed here. y
i
symbolises a
sample of either Y
1
or Y
2
or the concatenation, Y
1,2
=
[Y
1
; Y
2
] for J
Y
1,2
, M is the dimensionality of the output space
and is M = M
1
, M
2
or M
1
+ M
2
based on the term of (4)
that is considered. The j
th
element of b
r
(y
i
) in (7) is defined
as b
r
(y
i
)
j
, d is the support of the output space and N is the
number of samples.
For systems where the dimensionality of the input space
N is more than the number of samples n, the mapping
can be arbitrary. To obtain a single solution a penalty on
the projection co-efficients α
1
and α
2
can be imposed. The
minimal energy solution can be obtained by imposing the L
2
penalty while the L
1
norm is shown to lead to the sparsest
solution. The fact that the L
1
penalty leads to a vector
with fewest nonzero elements for both overdetermined and
underdetermined systems has been demonstrated [14].
A. Optimizing Linear Mappings via the L
2
Regularisation
Projection coefficients that maximise the objective func-
tion can now be found using the algorithm given in Fig. 1
which includes the update rule (6) for each entropy term
(1) and imposition of a L
2
penalty (L
2(α
1
)
, L
2(α
2
)
) on the
projection coefficients α
1
and α
2
J = J
Y
1
+ J
Y
2
J
Y
1,2
β
L
2(α
1
)
+ L
2(α
2
)
(10)
where the L
2
penalty is derived from
L
2(α
1
)
=
α
1
α
T
1
α
1
(11)
therefore
L
2(α
1
)
= 2Y
1
X
1
1
X
1
1
T
(12)
L
2(α
2
)
= 2Y
2
X
1
2
X
2
1
T
(13)
where X
1
is the pseudo inverse of matrix X.
B. Optimizing Linear Mappings via the L
1
Regularisation
The L
2
criterion seeks to spread the energy of α
1
and
α
2
over many small valued components, rather than concen-
trating the energy on a few dominant ones. The applications
examined in this paper, requires identifying a few dominant
components in the input signal space that are related to each
other. Hence, the solution of the parameter vectors α
1
and
α
2
should be sparse identifying the minimum number of
nonzero elements naturally suggesting the use of the L
1
norm as an appropriate penalty function. In addition, the
number of samples and dimensionality of the signals can vary
between applications producing an either underdetermined or
overdetermined system of equations Y
1
= α
1
X
1
and Y
2
=
α
2
X
2
. The L
1
norm performs equally well as the L
2
norm
on overdetermined system of equations while outperforming
L
2
norm for underdetermined problems [9] especially where
the solution is expected to have fewer non zeros than 1/8 of
the number of equations.
The update equation for the gradient descent method when
using the L
1
penalty is
J = J
Y
1
+ J
Y
2
J
Y
1,2
β
L
1(α
1
)
+ L
1(α
2
)
(14)
The equations for the L
1
norm penalty are derived
min kα
1
k
1
subject to Y
1
= α
1
X
1
min kα
2
k
1
subject to Y
2
= α
2
X
2
(15)
where k k
1
represents the L
1
norm. Since the projections
α
1
, α
2
may be of very high dimensionality, it is assumed
that
min kα
1
k
1
= |α
1
1
| + |α
1
2
| + · · · |α
1
n
| (16)
Therefore the L
1
penalty is
min kα
1
k
1
Y
1
(17)
further
|α
1
|
Y
1
1
=
P
n
i=1
|α
1
i
|
Y
1
1
=
P
|X
1
1
|
row1
sign|Y
1
1
|
.
.
.
|α
1
|
Y
1
i
=
P
n
i=1
|α
1
i
|
Y
1
i
=
P
|X
1
1
|
rowi
sign|Y
1
i
|
resulting in
min kα
1
k
1
Y
1
=
X
|X
1
1
| sign|Y
1
| (18)
C. Stopping Criteria
All iterative optimization methods require stopping cri-
teria to indicate the successful completion of the process.
Consider,
δ =
max(∆
NN
) m in(∆
NN
)
max(∆)
(19)
where, the term
NN
is the nearest neighbor distance in
the resulting output distribution, is the distance between
any two samples in the output distribution, max(.) and min(.)
are the maximum distance and minimum distance between
samples in the output space. The numerator is a measure
of uniformity of the output space and the denominator is a
measure of how well the output space is filled. Therefore,
(19) can be used as a convergence criterion. However,
is dependent on the number of samples obtained from the
signal n, the dimensionality N and the size of the output
space d. As the numerator approaches zero for uniformly

Fig. 1. Block Diagram of Proposed Method, η is the learning rate, β is
the normaliser on the L
1
/L
2
penalties applied to the projection coefficients
α
1
and α
2
.
distributed samples and for a given threshold γ required, δ
may be determined by
γ
d
N
/n
. For all experiments in this
paper following parameter values have been chosen.
TABLE I
OPTIMIZATION LEARNING RATE COEFFICIENTS
η
M
1
M
2
N
1
N
2
β max
max(X
2
)
N
2
N
1
, max(X
1
)
N
1
N
2
IV. SIMULATION AND EXPERIMENTAL RESULTS
For the simulation and experimental study, output space
dimensionality is chosen to be d = 2. For a sample size, n =
100, the stopping criteria from equation (19) is calculated to
be δ < 0.035. In order to detect that the optimization has
reached a local minima the variation of δ should be contained
in a 1.5e
3
limit at least for a minimal convergence span of
5 iterations.
A. Simulation Results
Two simulations are performed to evaluate the proposed
method. Simulation 1: The purpose is to detect identical
signal pairings embedded within a number of unrelated
signals. Simulation 2: The purpose is to identify non
informative signals. We have utilised Johnsons [12] method
of generating signals with an arbitrary high order of
dependency. Signals that are generated for the purpose of
simulation are scaled to [1, 1].
Simulation 1: Identical Signals: One hundred signals
are generated, containing 100 samples each. Five signals are
selected and supplied as sensor 1 output {1, 2, 3, 4, 5} and
one signal is selected as sensor 2 output {1}, thus, N
1
= 5
and N
2
= 1 with one signal in common.
In order to determine the most informative signal we
examine the vector of α
1
co-efficients, where each α
1
i
corresponds to a X
1
i
. Results are presented in Fig. 2 with
the mapping coefficients, α
1
i
i {1, 5} in blue, red, green,
cyan and yellow respectively. The convergence criterion,
δ is plotted as the dashed gray line. The results show
the highest coefficient for α
1
1
confirming that signal 1 is
common between the sensors. Applying the L
1
norm penalty
0 5 10 15 20 25 30 35 40 45 50
−0.5
0
0.5
1
1.5
(a) L1 penalty
0 50 100 150
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
(b) L2 penalty
Fig. 2. Results of indirect estimation of Mutual Information for signals
with underlying linear dependency
0 20 40 60 80 100 120 140 160 180 200
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
iteration
α
1
/ δ
(a) L1 penalty
0 20 40 60 80 100 120 140 160 180 200
−3
−2
−1
0
1
2
3
4
iteration
α
1
/ δ
(b) L2 penalty
Fig. 3. Results of indirect estimation of MI for non informative signals
to the optimization produced faster convergence, occurring in
iteration 38 compared to 142 iteration with L
2
norm penalty.
It is to be noted that only the non-zero mapping parameter
ideally should be α
1
1
and all others should be zero. However,
due to the approximations in the objective function and the
presence of local minima, the other mapping parameters have
smaller non-zero vales.
Simulation 2: Non Informative Signals
In this simulation signals {1, 2 , 3, 4, 5} are selected as
sensor 1 output and signal {6} is chosen as the sensor 2
output, clearly there is no common signals. Fig. 3 shows that
neither L
1
or L
2
norm penalty has produced convergence
in 200 iterations. In fact the solution based on the L
1
regularization shows a divergence from an optimized solution
verifying there is no common signal.
B. Experiments
Two experiments are performed to evaluate the proposed
method in establishing the relationship between multi-modal
sensory data by identifying informative signals without any
prior knowledge about geometric parameters. Experiment 1:
The purpose is to localise the audio source in the video
data sequence. Experiment 2: The purpose is to identify the
common source in a laser and video data stream.
Experiment 1: Audio and Video Signals: A microphone
and camera were used to capture activity in an office en-
vironment consisting of a person (left on image) reading a
sequence of numbers and another person (right of image)
mimicking unscripted sentences (see Fig. 4(a)). Video data
was captured at 15Hz while audio signal was captured at
48KHz with only 10KHz of content used. Both video and
audio data streams were synchronised in time. Color images
acquired were transformed to grey scale and pixel intensity
values (consisting of 640 480 = 307200 pixels per frame)
of 100 frames were analyzed using raw pixel values. The

Citations
More filters

Posted Content
17 Dec 2018
TL;DR: A novel high-resolution talking face generation model for arbitrary person is proposed by discovering the cross-modality coherence via Mutual Information Approximation (MIA) by assuming the modality difference between audio and video is larger that of real video and generated video.
Abstract: Given an arbitrary speech clip and a facial image, talking face generation aims to synthesize a talking face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video speech. Most existing methods mainly focus on either disentangling the information in a single image or learning temporal information between frames. However, speech audio and video often have cross-modality coherence that has not been well addressed during synthesis. Therefore, this paper proposes a novel high-resolution talking face generation model for arbitrary person by discovering the cross-modality coherence via Mutual Information Approximation (MIA). By assuming the modality difference between audio and video is larger that of real video and generated video, we estimate mutual information between real audio and video, and then use a discriminator to enforce generated video distribution approach real video distribution. Furthermore, we introduce a dynamic attention technique on the mouth to enhance the robustness during the training stage. Experimental results on benchmark dataset LRW transcend the state-of-the-art methods on prevalent metrics with robustness on gender, pose variations and high-resolution synthesizing.

24 citations


Cites background from "Cross-modal localization through mu..."

  • ...As a quantity for capturing non-linear statistical dependencies between variables, it has found applications in a wide range of domains and tasks, including clustering gene expression data [28], feature selection [30] and cross-modality localization [1]....

    [...]


Posted Content
Hao Zhu1, Huaibo Huang2, Yi Li2, Aihua Zheng1  +1 moreInstitutions (2)
TL;DR: A novel arbitrary talking face generation framework is proposed by discovering the audio-visual coherence via the proposed Asymmetric Mutual Information Estimator (AMIE) and a Dynamic Attention (DA) block by selectively focusing the lip area of the input image during the training stage, to further enhance lip synchronization.
Abstract: Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image. Most existing methods mainly focus on either disentangling the information in a single image or learning temporal information between frames. However, cross-modality coherence between audio and video information has not been well addressed during synthesis. In this paper, we propose a novel arbitrary talking face generation framework by discovering the audio-visual coherence via the proposed Asymmetric Mutual Information Estimator (AMIE). In addition, we propose a Dynamic Attention (DA) block by selectively focusing the lip area of the input image during the training stage, to further enhance lip synchronization. Experimental results on benchmark LRW dataset and GRID dataset transcend the state-of-the-art methods on prevalent metrics with robust high-resolution synthesizing on gender and pose variations.

12 citations


Journal ArticleDOI
01 Jan 2001-Forestry

8 citations


Proceedings ArticleDOI
17 Dec 2015
TL;DR: This paper presents a novel approach to 2D street map-based localization for mobile robots that navigate mainly in urban sidewalk environments and employs a computationally efficient estimator of Squared-loss Mutual Information through which it achieved near real-time performance.
Abstract: In this paper, we present a novel approach to 2D street map-based localization for mobile robots that navigate mainly in urban sidewalk environments. Recently, localization based on the map built by Simultaneous Localization and Mapping (SLAM) has been widely used with great success. However, such methods limit robot navigation to environments whose maps are prebuilt. In other words, robots cannot navigate in environments that they have not previously visited. We aim to relax the restriction by employing existing 2D street maps for localization. Finding an exact match between sensor data and a street map is challenging because, unlike maps built by robots, street maps lack detailed information about the environment (such as height and color). Our approach to coping with this difficulty is to maximize statistical dependence between sensor data and the map, and localization is achieved through maximization of a Mutual Information-based criterion. Our method employs a computationally efficient estimator of Squared-loss Mutual Information through which we achieved near real-time performance. The effectiveness of our method is evaluated through localization experiments using real-world data sets.

5 citations


Cites methods from "Cross-modal localization through mu..."

  • ...Although MI has already been used to register multimodal sensor data [14] [15] [16], to our knowledge, it has never been applied to street map-based localization....

    [...]


Journal ArticleDOI
TL;DR: This paper proposes a novel localization approach that can be applied to sidewalks based on existing 2D street maps and employs a computationally efficient estimator of squared-loss mutual information, through which it achieves near real-time performance.
Abstract: Recently, localization methods based on detailed maps constructed using simultaneous localization and mapping have been widely used for mobile robot navigation. However, the cost of building such maps increases rapidly with expansion of the target environment. Here, we consider the problem of localization of a mobile robot based on existing 2D street maps. Although a large amount of research on this topic has been reported, the majority of the previous studies have focused on car-like vehicles that navigate on roadways; thus, the efficacy of such methods for sidewalks is not yet known. In this paper, we propose a novel localization approach that can be applied to sidewalks. Whereas roadways are typically marked, e.g. by white lines, sidewalks are not and, therefore, road boundary detection is not straightforward. Thus, obtaining exact correspondence between sensor data and a street map is complex. Our approach to overcoming this difficulty is to maximize the statistical dependence between the sensor data ...

4 citations


References
More filters

Book
Thomas M. Cover1, Joy A. Thomas2Institutions (2)
01 Jan 1991
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Abstract: Preface to the Second Edition. Preface to the First Edition. Acknowledgments for the Second Edition. Acknowledgments for the First Edition. 1. Introduction and Preview. 1.1 Preview of the Book. 2. Entropy, Relative Entropy, and Mutual Information. 2.1 Entropy. 2.2 Joint Entropy and Conditional Entropy. 2.3 Relative Entropy and Mutual Information. 2.4 Relationship Between Entropy and Mutual Information. 2.5 Chain Rules for Entropy, Relative Entropy, and Mutual Information. 2.6 Jensen's Inequality and Its Consequences. 2.7 Log Sum Inequality and Its Applications. 2.8 Data-Processing Inequality. 2.9 Sufficient Statistics. 2.10 Fano's Inequality. Summary. Problems. Historical Notes. 3. Asymptotic Equipartition Property. 3.1 Asymptotic Equipartition Property Theorem. 3.2 Consequences of the AEP: Data Compression. 3.3 High-Probability Sets and the Typical Set. Summary. Problems. Historical Notes. 4. Entropy Rates of a Stochastic Process. 4.1 Markov Chains. 4.2 Entropy Rate. 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph. 4.4 Second Law of Thermodynamics. 4.5 Functions of Markov Chains. Summary. Problems. Historical Notes. 5. Data Compression. 5.1 Examples of Codes. 5.2 Kraft Inequality. 5.3 Optimal Codes. 5.4 Bounds on the Optimal Code Length. 5.5 Kraft Inequality for Uniquely Decodable Codes. 5.6 Huffman Codes. 5.7 Some Comments on Huffman Codes. 5.8 Optimality of Huffman Codes. 5.9 Shannon-Fano-Elias Coding. 5.10 Competitive Optimality of the Shannon Code. 5.11 Generation of Discrete Distributions from Fair Coins. Summary. Problems. Historical Notes. 6. Gambling and Data Compression. 6.1 The Horse Race. 6.2 Gambling and Side Information. 6.3 Dependent Horse Races and Entropy Rate. 6.4 The Entropy of English. 6.5 Data Compression and Gambling. 6.6 Gambling Estimate of the Entropy of English. Summary. Problems. Historical Notes. 7. Channel Capacity. 7.1 Examples of Channel Capacity. 7.2 Symmetric Channels. 7.3 Properties of Channel Capacity. 7.4 Preview of the Channel Coding Theorem. 7.5 Definitions. 7.6 Jointly Typical Sequences. 7.7 Channel Coding Theorem. 7.8 Zero-Error Codes. 7.9 Fano's Inequality and the Converse to the Coding Theorem. 7.10 Equality in the Converse to the Channel Coding Theorem. 7.11 Hamming Codes. 7.12 Feedback Capacity. 7.13 Source-Channel Separation Theorem. Summary. Problems. Historical Notes. 8. Differential Entropy. 8.1 Definitions. 8.2 AEP for Continuous Random Variables. 8.3 Relation of Differential Entropy to Discrete Entropy. 8.4 Joint and Conditional Differential Entropy. 8.5 Relative Entropy and Mutual Information. 8.6 Properties of Differential Entropy, Relative Entropy, and Mutual Information. Summary. Problems. Historical Notes. 9. Gaussian Channel. 9.1 Gaussian Channel: Definitions. 9.2 Converse to the Coding Theorem for Gaussian Channels. 9.3 Bandlimited Channels. 9.4 Parallel Gaussian Channels. 9.5 Channels with Colored Gaussian Noise. 9.6 Gaussian Channels with Feedback. Summary. Problems. Historical Notes. 10. Rate Distortion Theory. 10.1 Quantization. 10.2 Definitions. 10.3 Calculation of the Rate Distortion Function. 10.4 Converse to the Rate Distortion Theorem. 10.5 Achievability of the Rate Distortion Function. 10.6 Strongly Typical Sequences and Rate Distortion. 10.7 Characterization of the Rate Distortion Function. 10.8 Computation of Channel Capacity and the Rate Distortion Function. Summary. Problems. Historical Notes. 11. Information Theory and Statistics. 11.1 Method of Types. 11.2 Law of Large Numbers. 11.3 Universal Source Coding. 11.4 Large Deviation Theory. 11.5 Examples of Sanov's Theorem. 11.6 Conditional Limit Theorem. 11.7 Hypothesis Testing. 11.8 Chernoff-Stein Lemma. 11.9 Chernoff Information. 11.10 Fisher Information and the Cram-er-Rao Inequality. Summary. Problems. Historical Notes. 12. Maximum Entropy. 12.1 Maximum Entropy Distributions. 12.2 Examples. 12.3 Anomalous Maximum Entropy Problem. 12.4 Spectrum Estimation. 12.5 Entropy Rates of a Gaussian Process. 12.6 Burg's Maximum Entropy Theorem. Summary. Problems. Historical Notes. 13. Universal Source Coding. 13.1 Universal Codes and Channel Capacity. 13.2 Universal Coding for Binary Sequences. 13.3 Arithmetic Coding. 13.4 Lempel-Ziv Coding. 13.5 Optimality of Lempel-Ziv Algorithms. Compression. Summary. Problems. Historical Notes. 14. Kolmogorov Complexity. 14.1 Models of Computation. 14.2 Kolmogorov Complexity: Definitions and Examples. 14.3 Kolmogorov Complexity and Entropy. 14.4 Kolmogorov Complexity of Integers. 14.5 Algorithmically Random and Incompressible Sequences. 14.6 Universal Probability. 14.7 Kolmogorov complexity. 14.9 Universal Gambling. 14.10 Occam's Razor. 14.11 Kolmogorov Complexity and Universal Probability. 14.12 Kolmogorov Sufficient Statistic. 14.13 Minimum Description Length Principle. Summary. Problems. Historical Notes. 15. Network Information Theory. 15.1 Gaussian Multiple-User Channels. 15.2 Jointly Typical Sequences. 15.3 Multiple-Access Channel. 15.4 Encoding of Correlated Sources. 15.5 Duality Between Slepian-Wolf Encoding and Multiple-Access Channels. 15.6 Broadcast Channel. 15.7 Relay Channel. 15.8 Source Coding with Side Information. 15.9 Rate Distortion with Side Information. 15.10 General Multiterminal Networks. Summary. Problems. Historical Notes. 16. Information Theory and Portfolio Theory. 16.1 The Stock Market: Some Definitions. 16.2 Kuhn-Tucker Characterization of the Log-Optimal Portfolio. 16.3 Asymptotic Optimality of the Log-Optimal Portfolio. 16.4 Side Information and the Growth Rate. 16.5 Investment in Stationary Markets. 16.6 Competitive Optimality of the Log-Optimal Portfolio. 16.7 Universal Portfolios. 16.8 Shannon-McMillan-Breiman Theorem (General AEP). Summary. Problems. Historical Notes. 17. Inequalities in Information Theory. 17.1 Basic Inequalities of Information Theory. 17.2 Differential Entropy. 17.3 Bounds on Entropy and Relative Entropy. 17.4 Inequalities for Types. 17.5 Combinatorial Bounds on Entropy. 17.6 Entropy Rates of Subsets. 17.7 Entropy and Fisher Information. 17.8 Entropy Power Inequality and Brunn-Minkowski Inequality. 17.9 Inequalities for Determinants. 17.10 Inequalities for Ratios of Determinants. Summary. Problems. Historical Notes. Bibliography. List of Symbols. Index.

42,928 citations


"Cross-modal localization through mu..." refers background in this paper

  • ...Mutual information between two high dimensional signals X1 and X2 can be indirectly estimated by mapping the signals into a lower dimensional space, by exploiting the data processing inequality [6] that defines lower bounds on mutual information....

    [...]


Journal ArticleDOI
TL;DR: A novel method for sparse signal recovery that in many situations outperforms ℓ1 minimization in the sense that substantially fewer measurements are needed for exact recovery.
Abstract: It is now well understood that (1) it is possible to reconstruct sparse signals exactly from what appear to be highly incomplete sets of linear measurements and (2) that this can be done by constrained l1 minimization. In this paper, we study a novel method for sparse signal recovery that in many situations outperforms l1 minimization in the sense that substantially fewer measurements are needed for exact recovery. The algorithm consists of solving a sequence of weighted l1-minimization problems where the weights used for the next iteration are computed from the value of the current solution. We present a series of experiments demonstrating the remarkable performance and broad applicability of this algorithm in the areas of sparse signal recovery, statistical estimation, error correction and image processing. Interestingly, superior gains are also achieved when our method is applied to recover signals with assumed near-sparsity in overcomplete representations—not by reweighting the l1 norm of the coefficient sequence as is common, but by reweighting the l1 norm of the transformed object. An immediate consequence is the possibility of highly efficient data acquisition protocols by improving on a technique known as Compressive Sensing.

4,511 citations


"Cross-modal localization through mu..." refers background in this paper

  • ...L1 norm has found extensive use recently in solving convex optimisation problems from arbitrary signals estimated from incomplete set of measurements corrupted by noise [5] and also exhibits a very useful property, which is the preservation of the sparsity of the relationship between the multidimensional random variables....

    [...]


Journal ArticleDOI
TL;DR: An overview is presented of the medical image processing literature on mutual-information-based registration, an introduction for those new to the field, an overview for those working in the field and a reference for those searching for literature on a specific application.
Abstract: An overview is presented of the medical image processing literature on mutual-information-based registration. The aim of the survey is threefold: an introduction for those new to the field, an overview for those working in the field, and a reference for those searching for literature on a specific application. Methods are classified according to the different aspects of mutual-information-based registration. The main division is in aspects of the methodology and of the application. The part on methodology describes choices made on facets such as preprocessing of images, gray value interpolation, optimization, adaptations to the mutual information measure, and different types of geometrical transformations. The part on applications is a reference of the literature available on different modalities, on interpatient registration and on different anatomical objects. Comparison studies including mutual information are also considered. The paper starts with a description of entropy and mutual information and it closes with a discussion on past achievements and some future challenges.

3,010 citations


Additional excerpts

  • ...Combining the nonparametric density estimator with an approximation of theoretical entropy has been widely described in the literature to overcome this problem [16]....

    [...]


Journal ArticleDOI
TL;DR: 1. Density estimation for exploring data 2. D density estimation for inference 3. Nonparametric regression for explore data 4. Inference with nonparametric regressors 5. Checking parametric regression models 6. Comparing regression curves and surfaces
Abstract: 1. Density estimation for exploring data 2. Density estimation for inference 3. Nonparametric regression for exploring data 4. Inference with nonparametric regression 5. Checking parametric regression models 6. Comparing regression curves and surfaces 7. Time series data 8. An introduction to semiparametric and additive models References

1,361 citations


"Cross-modal localization through mu..." refers methods in this paper

  • ...It is generally not feasible to obtain a closed form solution to this problem without numerical methods such as Powell’s direction set method [3]....

    [...]


Journal ArticleDOI
John W. Fisher1, Trevor Darrell1Institutions (1)
TL;DR: A probabilistic multimodal generation model is introduced and used to derive an information theoretic measure of cross-modal correspondence and nonparametric statistical density modeling techniques can characterize the mutual information between signals from different domains.
Abstract: Audio and visual signals arriving from a common source are detected using a signal-level fusion technique. A probabilistic multimodal generation model is introduced and used to derive an information theoretic measure of cross-modal correspondence. Nonparametric statistical density modeling techniques can characterize the mutual information between signals from different domains. By comparing the mutual information between different pairs of signals, it is possible to identify which person is speaking a given utterance and discount errant motion or audio from other utterances or nonspeech events.

124 citations


"Cross-modal localization through mu..." refers methods in this paper

  • ...The methods for mutual information (MI) estimation can be classified into two broad categories, based on whether mutual information is computed directly or the condition for maximum MI is obtained indirectly through an optimization process that does not involve computing MI [2], [7]....

    [...]


Network Information
Related Papers (5)
Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20182
20161
20151
20011