scispace - formally typeset
Open AccessJournal ArticleDOI

Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

Reads0
Chats0
TLDR
In this paper, a number of data-driven solutions based on matrix and tensor decompositions are discussed, emphasizing how they account for diversity across the data sets, and a key concept, diversity, is introduced.
Abstract
In various disciplines, information about the same phenomenon can be acquired from different types of detectors, at different conditions, in multiple experiments or subjects, among others. We use the term “modality” for each such acquisition framework. Due to the rich characteristics of natural phenomena, it is rare that a single modality provides complete knowledge of the phenomenon of interest. The increasing availability of several modalities reporting on the same system introduces new degrees of freedom, which raise questions beyond those related to exploiting each modality separately. As we argue, many of these questions, or “challenges,” are common to multiple domains. This paper deals with two key issues: “why we need data fusion” and “how we perform it.” The first issue is motivated by numerous examples in science and technology, followed by a mathematical framework that showcases some of the benefits that data fusion provides. In order to address the second issue, “diversity” is introduced as a key concept, and a number of data-driven solutions based on matrix and tensor decompositions are discussed, emphasizing how they account for diversity across the data sets. The aim of this paper is to provide the reader, regardless of his or her community of origin, with a taste of the vastness of the field, the prospects, and the opportunities that it holds.

read more

Content maybe subject to copyright    Report

PROCEEDINGS OF THE IEEE, VOL. XX, NO. YY, MONTH 2015 1
Multimodal Data Fusion: An Overview
of Methods, Challenges and Prospects
Dana Lahat, T
¨
ulay Adalı, Fellow, IEEE, and Christian Jutten, Fellow, IEEE
Abstract—In various disciplines, information about the same
phenomenon can be acquired from different types of detectors, at
different conditions, in multiple experiments or subjects, among
others. We use the term “modality” for each such acquisition
framework. Due to the rich characteristics of natural phenomena,
it is rare that a single modality provides complete knowledge
of the phenomenon of interest. The increasing availability of
several modalities reporting on the same system introduces new
degrees of freedom, which raise questions beyond those related to
exploiting each modality separately. As we argue, many of these
questions, or “challenges”, are common to multiple domains.
This paper deals with two key questions: “why we need data
fusion” and “how we perform it”. The first question is motivated
by numerous examples in science and technology, followed by a
mathematical framework that showcases some of the benefits that
data fusion provides. In order to address the second question,
“diversity” is introduced as a key concept, and a number of data-
driven solutions based on matrix and tensor decompositions are
discussed, emphasizing how they account for diversity across the
datasets. The aim of this paper is to provide the reader, regardless
of his or her community of origin, with a taste of the vastness
of the field, the prospects and opportunities that it holds.
Index Terms—Keywords: data fusion, multimodality, multiset
data analysis, latent variables, tensor, overview.
I. INTRODUCTION
Information about a phenomenon or a system of interest can
be obtained from different types of instruments, measurement
techniques, experimental setups, and other types of sources.
Due to the rich characteristics of natural processes and envi-
ronments, it is rare that a single acquisition method provides
complete understanding thereof. The increasing availability of
multiple datasets that contain information, obtained using dif-
ferent acquisition methods, about the same system, introduces
new degrees of freedom that raise questions beyond those
related to analysing each dataset separately.
The foundations of modern data fusion have been laid in the
first half of the 20th century [1], [2]. Joint analysis of multiple
datasets has since been the topic of extensive research, and
earned a significant leap forward in the late 1960’s–early
1970’s with the formulation of concepts and techniques such
as multi-set canonical correlation analysis (CCA) [3], parallel
factor analysis (PARAFAC) [4], [5], and other tensor decom-
positions [6], [7]. However, until rather recently, in most cases,
D. Lahat and Ch. Jutten are with GIPSA-Lab, UMR CNRS 5216, Grenoble
Campus, BP46, F-38402 Saint Martin d’H
`
eres, France. T. Adalı is with the
Department of CSEE, University of Maryland, Baltimore County, Baltimore,
MD 21250, USA. email:{Dana.Lahat, Christian.Jutten}@gipsa-lab.grenoble-
inp.fr, adali@umbc.edu. This work is supported by the project CHESS, 2012-
ERC-AdG-320684 (D. Lahat and Ch. Jutten) and by the grants NSF-IIS
1017718 and NSF-CCF 1117056 (T. Adalı). GIPSA-Lab is a partner of the
LabEx PERSYVAL-Lab (ANR–11-LABX-0025).
these data fusion methodologies were confined within the
limits of psychometrics and chemometrics, the communities
in which they evolved. With recent technological advances, in
a growing number of domains, the availability of datasets that
correspond to the same phenomenon has increased, leading to
increased interest in exploiting them efficiently. Many of the
providers of multi-view, multirelational, and multimodal data
are associated with high-impact commercial, social, biomed-
ical, environmental, and military applications, and thus the
drive to develop new and efficient analytical methodologies is
high and reaches far beyond pure academic interest.
Motivations for data fusion are numerous. They include
obtaining a more unified picture and global view of the system
at hand; improving decision making; exploratory research; an-
swering specific questions about the system, such as identify-
ing common vs. distinctive elements across modalities or time;
and in general, extracting knowledge from data for various
purposes. However, despite the evident potential benefit, and
massive work that has already been done in the field (see, for
example, [8]–[16] and references therein), the knowledge of
how to actually exploit the additional diversity that multiple
datasets offer is still at its very preliminary stages.
Data fusion is a challenging task for several reasons [8]–
[11], [17]–[19]. First, the data are generated by very complex
systems: biological, environmental, sociological, and psycho-
logical, to name a few, driven by numerous underlying pro-
cesses that depend on a large number of variables to which
we have no access. Second, due to the augmented diversity,
the number, type and scope of new research questions that can
be posed is potentially very large. Third, working with hetero-
geneous datasets such that the respective advantages of each
dataset are maximally exploited, and drawbacks suppressed,
is not an evident task. We elaborate on these matters in the
following sections. Most of these questions have been devised
only in the very recent years, and, as we show in the sequel,
only a fraction of their potential has already been exploited.
Hence, we refer to them as “challenges”.
A rather wide perspective on challenges in data fusion is
presented by [8], which discusses linked-mode decomposition
models within the framework of chemometrics and psycho-
metrics, and [9], which focusses on “automated decision
making” with special attention to multisensor information
fusion. In practice, however, challenges in data fusion are most
often brought up within a framework dedicated to a specific
application, model and dataset; examples will be given in the
sections that follow.
In this paper, we bring together a comprehensive (but
definitely not exhaustive) list of challenges in data fusion.

PROCEEDINGS OF THE IEEE, VOL. XX, NO. YY, MONTH 2015 2
Following from [8], [9], [16], [19] (and others), and further
emphasized by our discussion in this paper, it is clear that
at the appropriate level of abstraction, the same challenge in
data fusion can be relevant to completely different and diverse
applications, goals and data types. Consequently, a solution to
a challenge that is based on a sufficiently data-driven, model-
free approach may turn out to be useful in very different
domains. Therefore, there is an obvious interest in opening
up the discussion of data fusion challenges to include and
involve disparate communities, so that each community could
inform the others. Our goal is to stimulate and emphasize the
relevance and importance of a perspective based on challenges
to advanced data fusion. More specifically, we would like
to promote data-driven approaches, that is, approaches with
minimal and weak priors and constraints, such as sparsity,
non-negativity, low-rank and independence, among others,
that can be useful to more than one specific application or
dataset. Hence, we present these challenges in quite a general
framework that is not specific to an application, goal or data
type. We also give examples and motivations from different
domains.
In order to contain our discussion, we focus on setups in
which a phenomenon or a system is observed using multiple
instruments, measurement devices or acquisition techniques. In
this case, each acquisition framework is denoted as a modality
and is associated with one dataset. The whole setup, in which
one has access to data obtained from multiple modalities, is
known as multimodal. A key property of multimodality is
complementarity, in the sense that each modality brings to
the whole some type of added value that cannot be deduced
or obtained from any of the other modalities in the setup.
In mathematical terms, this added value is known as diversity.
Diversity allows to reduce the number of degrees of freedom in
the system by providing constraints that enhance uniqueness,
interpretability, robustness, performance, and other desired
properties, as will be illustrated in the rest of this paper.
Diversity can be found in a broad range of scenarios and plays
a key role in a wide scope of mathematical and engineering
studies. Accordingly, we suggest the following operative def-
inition for the special type of diversity that is associated with
multimodality:
Definition I.1: Diversity (due to multimodality) is the
property that allows to enhance the uses, benefits and
insights (such as those discussed in Section II), in a way
that cannot be achieved with a single modality.
Diversity is the key to data fusion, as will be explained in
Section III. Furthermore, in Section III, we demonstrate how a
diversity approach to data fusion can provide a fresh new look
on previously well-known and well-founded data and signal
processing techniques.
As already noted, “data fusion” is quite a diffuse con-
cept that takes different interpretations with applications and
goals [8], [9], [20]. Therefore, within the context of this
paper, and in accordance with the types of problems on
which we focus, our emphasis is on the following tighter
interpretation [21]:
Definition I.2: Data fusion is the analysis of several
datasets such that different datasets can interact and
inform each other.
This concept will be given a more concrete meaning in
Sections III and V.
The goal of this paper is to provide some ideas, perspec-
tives, and guidelines as to how to approach data fusion. This
paper is not a review, not a literature survey, not a tutorial
nor a cookbook. As such, it does not propose or promote
any specific solution or method. On the contrary, our message
is that whatever specific method or approach is considered,
it should be kept in mind that it is just one among a very
large set, and should be critically judged as such. In the same
vein, any example in this paper should only be regarded as a
concretization of a much broader idea.
How to read this paper? In order to make this paper
accessible for readers with various interests and back-
grounds, it is organized in two types of cross-sections.
The first part (Sections IIIII) deals with the question
why?”, i.e., why we need data fusion. The second part
(Sections IVV) deals with the question how?”, i.e, how
we perform data fusion. Each question is treated on two
levels: data (Sections II and IV), and theory (Sections III
and V). More specifically, Section II presents the concepts
of multimodality and data fusion, and motivates them
using examples from various applications. In Section III
we introduce the concept of diversity as a key to data
fusion, and give it a concrete mathematical formulation.
Section IV discusses complicating factors that should be
addressed in the actual processing of heterogeneous data.
Section V gives some guidelines as to how to actually
approach a data fusion problem from a model design
perspective. Section VI concludes our work.
II. WHAT IS MULTIMODALITY? WHY DO WE NEED
MULTIMODALITY?
For living creatures, multimodality is a very natural concept.
Living creatures use external and internal sensors, sometimes
denoted as “senses”, in order to detect and discriminate among
signals, communicate, cross-validate, disambiguate, and add
robustness to numerous life-and-death choices and responses
that must be taken rapidly, in a dynamic and constantly
changing internal and external environment.
The well-accepted paradigm that certain natural processes
and phenomena can express themselves under completely
different physical guises is the raison d’
ˆ
etre of multimodal
data fusion. Too often, however, very little is known about the
underlying relationships among these modalities. Therefore,
the most obvious and essential endeavour to be undertaken
in any multimodal data analysis task is exploratory: to learn
about relationships between modalities, their complementarity,

PROCEEDINGS OF THE IEEE, VOL. XX, NO. YY, MONTH 2015 3
shared vs. modality-specific information content, and other
mutual properties.
In this section, we try to provide, by numerous practical
examples, a more concrete sense to what we mean when we
speak of “diversity” and “multimodality”. The examples below
illustrate the complementary nature of multimodal data, and
as a result, some of the prominent uses, benefits and insights
that can be obtained from properly exploiting multimodal data,
especially as opposed to the analysis of single-set and single-
modal data. They also present various complicating factors,
due to which multimodal data fusion is not an evident task.
The purpose of this section is to show that multimodality is
already present in almost every field of science and technology,
and thus it is of potential interest to everyone.
A. Multisensory Systems
Example II-A.1: Audio-Visual Multimodality. Audio-visual
multimodality is probably the most intuitive, since it uses
two of our most informative senses. Most human verbal com-
munication involves seeing the speaker [18]. Indeed, a large
number of audio-visual applications involve human speech and
vision. In such applications, it is usually the audio channel that
conveys the information of interest. It is well-known that audio
and video convey complementary information. Audio has the
advantage over video that it does not require line of sight.
On the other hand, the visual modality is resistant to various
factors that make audio and speech processing difficult, such as
ambient noise, reverberations, and other acoustic disturbances.
Perhaps the most striking evidence to the amount of caution
that needs to be taken in the design and use of multimodal
systems is the “McGurk effect” [18]. In their seminal paper,
McGurk and McDonald [18] have shown that presenting
contradictory, or discrepant, speech [“ba”] and visual lip
movements [“ga”], can cause a human to perceive completely
different syllables [“da”]. These unexpected results have since
been the subject of ongoing exploratory research on human
perception and cognition [22, Section VI.A.5]. The McGurk
effect serves as an indication that in real-life scenarios, data
fusion can take paths much more intricate than simple sum-
mation of information. Not less important, it serves as a lesson
that fusing modalities can yield undesired results and severe
degradation of performance if the underlying relationships
between modalities are not properly understood.
Nowadays, audio-visual multimodality is used for a broad
range of applications [10], [23]. Examples include: speech
processing, including speech recognition, speech activity de-
tection, speech enhancement, speaker extraction and separa-
tion; scene analysis, for example tracking a speaker within
a group, biometrics and monitoring, for safety and security
applications [24]; human-machine interaction (HMI) [10];
calibration [25] [10, Section V.C]; and more.
Example II-A.2: Human-Machine Interaction. A domain
that is heavily inspired by natural multimodality is HMI. In
HMI, an important task is to design modalities that will make
HMI as natural, efficient and intuitive as possible [11]. The
idea is to combine multiple interaction modes based on audio-
vision, touch, smell, movement (e.g., gesture detection and
user tracking), interpretation of human language commands,
and other multisensory functions [10], [11]. The principal
point that makes HMI stand out among other multimodal
applications that we mention is that, in HMI, the modalities
are often interactive (as their name implies). Unlike other
multimodal applications that we mention, not one but two
very different types of systems (human and machine) are “ob-
served” by each others sensors, and the goal of data fusion is
not only to interpret each system’s output, but also to actively
convey information between these two systems. An added
challenge is that this task should usually be accomplished
in real-time. An additional complicating factor that makes
multimodal HMI stand out is due to the fact that the human
user often plays an active part in the choice of modalities
(from the available set) and in the way that they are used
in practice. This implies that the design of the multimodal
setup and data fusion procedure must rely not only on the
theoretically and technologically optimal combination of data
streams but also on the ability to predict and adapt to the
subjective cognitive preferences of the individual user. We
refer to [11] (and references therein) for further discussion
of these aspects.
B. Biomedical, Health
Example II-B.1: Understanding Brain Functionality. Func-
tional brain study deals with understanding how the different
elements of the brain take part in various perceptual and
cognitive activities. Functional brain study largely relies on
non-invasive imaging techniques, whose purpose is to recon-
struct a high-resolution spatio-temporal image of the neuronal
activity within the brain. The neuronal activity within the brain
generates ionic currents that are often modelled as dipoles.
These dipoles induce electric and magnetic fields that can
be directly recorded by electroencephalography (EEG) and
magnetoencephalography (MEG), respectively. In addition,
neuronal activity induces changes in magnetization between
oxygen-rich and oxygen-poor blood, known as the haemody-
namic response. This effect, also called blood-oxygen-level
dependent (BOLD) changes, can be detected by functional
magnetic resonance imaging (fMRI). Therefore, fMRI is an
indirect measure of neuronal activity. These three modalities
register data at regular time intervals and thus reflect temporal
dynamics. However, these techniques vary greatly in their
spatio-temporal resolutions: EEG and MEG data provide high
temporal [millisecond] resolution, whereas fMRI images have
low temporal [second] resolution. fMRI data are a set of
high-resolution 3D images, taken at regular time intervals,
representing the whole volume of the brain of a patient lying
in an fMRI scanner. EEG and MEG data are a set of time-
series signals reflecting voltage or neuromagnetic field changes
recorded at each of the (usually a few dozen of) electrodes
attached to the scalp (EEG) or fixed within an MEG scanner
helmet. The sensitivity of EEG and MEG to deep-brain signals
is limited. In addition, they have different selectivity to signals
as a function of brain morphology. Therefore, they provide
data at much poorer spatial resolution and do not have access
to the full brain volume. Consequently, the spatio-temporal in-

PROCEEDINGS OF THE IEEE, VOL. XX, NO. YY, MONTH 2015 4
formation provided by EEG, MEG and fMRI is highly comple-
mentary. Functional imaging techniques can be complemented
by other modalities that convey structural information. For
example, structural magnetic resonance imaging (sMRI) and
diffusion tensor imaging (DTI) report on the structure of the
brain in terms of gray matter, white matter and cerebrospinal
fluid. sMRI is based on nuclear magnetic resonance of water
protons. DTI measures the diffusion process of molecules,
mainly water, and thus reports also on brain connectivity. Each
of these methods is based on different physical principles and
is thus sensitive to different types of properties within the
brain. In addition, each method has different pros and cons in
terms of safety, cost, accuracy, and other parameters. Recent
technological advances allow recording data from several
functional brain imaging techniques simultaneously [26], [27],
thus further motivating advanced data fusion.
It is a well-accepted paradigm in neuroscience that EEG
and fMRI carry complementary information about brain func-
tion [26], [28]. However, their very heterogeneous nature and
the fact that brain processes are very complicated systems that
depend on numerous latent phenomena imply that simultane-
ously extracting useful information from them is not an evident
task. The fact that there is no ground truth is reflected in the
very broad range of methods and approaches that are being
proposed [12], [15], [17], [21], [28]–[31]. Works on biomed-
ical brain imaging often emphasize the exploratory nature of
this task. Despite decades of study, the underlying relationship
between EEG and fMRI is far from being understood [17],
[29], [30], [32].
A well-known challenge in brain imaging is the EEG inverse
problem. A prevalent assumption is that the measured EEG
signal is generated by numerous current dipoles within the
brain, and the goal is to localise the origins of this neuronal
activity. Often formulated as a linear inverse problem, it is
ill-posed: many different spatial current patterns within the
skull can give rise to identical measurements [33]. In order
to make the problem well-conditioned, additional hypotheses
are required. A large number of solutions are based on
adding various priors to the EEG data [34]. Alternatively, an
identifiable and unique solution can be obtained using spatial
constraints from fMRI [12], [22], [30].
Example II-B.2: Medical Diagnosis. Various medical condi-
tions such as potentially malignant tumours cannot be diag-
nosed by a single type of measurement due to many factors
such as low sensitivity, low positive predictive values, low
specificity (high false-positive), a limited number of spatial
samples (as in biopsy), and other limitations of the various
assessment techniques. In order to improve the performance
of the diagnosis, risk assessment and therapy options, it is
necessary to perform numerous medical assessments based on
a broad range of medical diagnostic techniques [35], [36]. For
example, one can augment physical examination, blood-tests,
biopsies, static and functional magnetic resonance imaging,
with other parameters such as genetic, environmental and
personal risk factors. The question of how to analyse all these
simultaneously available resources is largely open. Currently,
this task relies mostly on human medical experts. One of the
main challenges is the automation of such decision procedures,
in order to improve correct interpretation, as well as save costs
and time [35].
Example II-B.3: Developing Non-Invasive Medical Diag-
nosis Techniques. In some cases, the use of multimodal
data fusion is only a first step in the design of a single-
modal system. In [37], the challenge is understanding the
link between surface and intra-cardiac electrodes measuring
the same atrial fibrillation event and the goal is eventually
extracting relevant atrial fibrillation activity using only the
non-invasive modality. For this aim, the intra-cardiac modality
is exploited as a reference to guide the extraction of an atrial
electrical signal of interest from non-invasive electrocardiog-
raphy (ECG) recordings. The difficulty lies in the fact that the
intra-cardiac modality provides a rather pure signal whereas
the ECG signal is a mixture of the desired signal with other
sources, and the mixing model is unknown.
Example II-B.4: Smart Patient Monitoring. Health moni-
toring using multiple types of sensors is drawing increasing
attention from modern health services. The goal is to provide
a set of non-invasive, non-intrusive, reasonable-cost sensors
that allow the patient to run a normal life while providing
reliable warnings in real-time. Here, we focus on monitoring,
predicting and warning epileptic patients from potentially
dangerous seizures [38]. The gold standard in monitoring
epileptic seizures is combining EEG and video, where EEG
is manually analysed by experts and the whole diagnostic
procedure requires a stay of up to several days in a hospital
setting. This procedure is expensive, time consuming, and
physically inconvenient for the patient. Obviously, it is not
practical for daily life. While much effort has already been
dedicated to the prediction of epileptic seizures from EEG,
with no clear-cut results so far, a considerable proportion of
potentially lethal seizures are hardly detectable by EEG at
all. Therefore, a primary challenge is to understand the link
between epileptic seizures and additional body parameters:
movement, breathing, heart-rate, and others. Due to the fact
that epileptic seizures vary within and across patients, and due
to the complex relations between different body systems, it is
likely that any such system should rely on more than one
modality [38].
C. Environmental Studies
Example II-C.1: Remote Sensing and Earth Observations.
Various sensor technologies can report on different aspects
of objects on Earth. Passive optical hyperspectral (resp. mul-
tispectral) imaging technologies report on material content
of the surface by reconstructing its spectral characteristics
from hundreds of (resp. a few) narrow (resp. broad) adjacent
spectral bands within the visible range and beyond. A third
type of an optical sensor is panchromatic imaging, which
generates a monochromatic image with a much broader band.
Typical spatial resolutions of hyperspectral, multispectral and
panchromatic images are tens of meters, a few meters and
less than one meter, respectively. Hence, there exists a trade-
off between spectral and spatial resolution [39], [40] [13,
Chapter 9]. Topographic information can be acquired from
active sensors such as light detection and ranging (LiDAR)

PROCEEDINGS OF THE IEEE, VOL. XX, NO. YY, MONTH 2015 5
and synthetic aperture radar (SAR). LiDAR is based on a
narrow pulsed laser beam and thus provides highly accurate
information about distance to objects, i.e., altitude. SAR is
based on radio waves that illuminate a rather wide area,
and the backscattered components reaching the sensor are
registered; interpreting the reflections from the surface requires
some additional processing with respect to (w.r.t.) LiDAR.
Both technologies can provide information about elevation,
three-dimensional structure of the observed objects, and their
surface properties. LiDAR, being based on a laser beam,
generally reports on the structure of the surface, although it
can partially penetrate through certain areas such as forest
canopy, providing information on the internal structure of the
trees, for example. This ability is a mixed blessing, however,
since it generates reflections that have to be accounted for.
SAR and LiDAR use different electromagnetic frequencies
and thus interact differently with materials and surfaces. As
an example, depending on the wavelength, SAR may see the
canopy as a transparent object (waves reach the soil under
the canopy), semi-transparent (they penetrate in the canopy
and interact with it) or opaque (they are reflected by the top
of the canopy). Optical techniques are passive, which implies
that they rely on natural illumination. Active sensors such as
LiDAR and SAR can operate at night and in shaded areas [41].
Beyond the strengths and weaknesses of each technology
w.r.t. the others, the use of each is limited by a certain in-
herent ambiguity. For example, hyperspectral imaging cannot
distinguish between objects made of the same material that
are positioned at different elevations, such as concrete roofs
and roads. LiDAR cannot distinguish between objects with
the same elevation and surface roughness that are made of
different materials such as natural and artificial grass [42].
SAR images may sometimes be difficult to interpret due to
their complex dependence on the geometry of the surface [41].
In real-life conditions, interpretability of the observations of
one modality may be difficult without additional information.
For example, in hyperspectral imaging, on a flat surface,
reflected light depends on the abundance (proportion of a ma-
terial in a pixel) and on the endmember (pure material present
in a pixel) reflectance. In a non-flat surface, the reflected light
depends also on the topography, which may induce variations
in scene illumination and scattering. Therefore, in non-flat
conditions, one cannot accurately extract material content
information from optical data alone. Adding a modality that
reports on the topography, such as LiDAR, is necessary to
resolve spectra accurately [43].
As an active initiative, we point out the yearly data fusion
contest of the IEEE Geoscience and Remote Sensing Society
(GRSS) (see dedicated paper in this issue [44]). Problems
addressed include multi-modal change detection, in which the
purpose is to detect changes in an area before and after an
event (a flood, in this case), given SAR and multispectral
imaging [45], using either all or part of the modalities; multi-
modal multi-temporal data fusion of optical, SAR and LiDAR
images taken at different years over the same urban area [41],
where suggested applications include assessing urban density,
change detection and overcoming adverse illumination con-
ditions for optical sensors; and proposing new methods for
fusing hyperspectral and LiDAR data of the same area, e.g., for
improved classification of objects [42].
Example II-C.2: Meteorological Monitoring. Accurate mea-
surements of atmospheric phenomena such as rain, water
vapour, dew, fog and snow are required for meteorological
analysis and forecasting, as well as for numerous applications
in hydrology, agriculture and aeronautical services. Data can
be acquired from various devices such as rain gauges, radars,
satellite-borne remote sensing devices (see Example II-C.1),
and recently also by exploiting existing commercial microwave
links [46]. Rain gauges, as an example, are simply cups that
collect the precipitation. Albeit the most direct and reliable
technique, their small sampling area implies very localized
representativeness and thus poor spatial resolution (e.g., [46],
[47]). Rain gauges may be read automatically at intervals as
short as seconds. Satellites observe Earth at different frequen-
cies, including visible, microwave, infrared, and shortwave
infrared to report on various atmospheric phenomena such
as water vapour content and temperature. The accuracy of
radar rainfall estimation may be affected by topography, beam
effects, distance from the radar, and other complicating factors.
Radars and satellite systems provide large spatial coverage;
however, they are less accurate in measuring precipitation
at ground level (e.g., [48]). Microwave links are deployed
by cellular providers for backhaul communication between
base stations. The signals transmitted by the base stations
are influenced by various atmospheric phenomena (e.g., [49]),
primarily attenuation due to rainfall [46], [47]. These changes
in signal strength are recorded at predefined time intervals and
kept in the cellular provider’s logs. Hence, the precipitation
data is in fact a “reverse engineering” of this information. The
microwave links’ measurements provide average precipitation
on the entire link and close to ground level [46]. Altogether,
these technologies are largely complementary in their ability to
detect and distinguish between different meteorological phe-
nomena, spatial coverage, temporal resolution, measurement
error, and other properties. Therefore, meteorological data are
often combined for better accuracy, coverage and resolution;
see, e.g., [19], [47], [48] and references therein.
Example II-C.3: Cosmology. A major endeavour in astron-
omy and astrophysics is understanding the formation of our
Universe. Recent results include robust support for the six-
parameter standard model of cosmology, of a Universe domi-
nated by Cold Dark Matter and a cosmological constant Λ,
known as ΛCDM [50], [51]. The purpose of ongoing and
planned sky surveys is to decrease the allowable uncertainty
volume of the six-dimensional ΛCDM parameter space and to
improve the constraints on the other cosmological parameters
that depend on it [51]. The goal is to validate (or disprove)
the standard model.
A major difficulty in astrophysics and cosmology is the
absence of ground truth. This is because cosmological pro-
cesses involve very high energies, masses, large space and time
scales that make experimental study prohibitive. The lack of
ground truth and experimental support implied that, from its
very beginning, cosmological research had to rely on cross-
validation of outcomes of different observations, numerical

Citations
More filters
Journal ArticleDOI

Tensor Decomposition for Signal Processing and Machine Learning

TL;DR: The material covered includes tensor rank and rank decomposition; basic tensor factorization models and their relationships and properties; broad coverage of algorithms ranging from alternating optimization to stochastic gradient; statistical performance analysis; and applications ranging from source separation to collaborative filtering, mixture and topic modeling, classification, and multilinear subspace learning.
Posted Content

Generalizing from a Few Examples: A Survey on Few-Shot Learning

TL;DR: A thorough survey to fully understand Few-Shot Learning (FSL), and categorizes FSL methods from three perspectives: data, which uses prior knowledge to augment the supervised experience; model, which used to reduce the size of the hypothesis space; and algorithm, which using prior knowledgeto alter the search for the best hypothesis in the given hypothesis space.
Journal ArticleDOI

Deep Multimodal Learning: A Survey on Recent Advances and Trends

TL;DR: This work first classify deep multimodal learning architectures and then discusses methods to fuse learned multi-modal representations in deep-learning architectures.

Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects This paper provides an overview of the main challenges in multimodal data fusion acrossvariousdisciplinesandaddressestwokeyissues:''whyweneeddatafusion''and ''how we perform it.''

TL;DR: The aim of this paper is to provide the reader with a taste of the vastness of the field, the prospects, and the opportunities that it holds, and a number of data-driven solutions based on matrix and tensor decompositions are discussed, emphasizing how they account for diversity across the data sets.
Journal ArticleDOI

A Survey of Multi-View Representation Learning

TL;DR: Multi-view representation learning has become a rapidly growing direction in machine learning and data mining areas as mentioned in this paper, and a comprehensive survey of multi-view representations can be found in this paper.
References
More filters
Book

Matrix Analysis

TL;DR: In this article, the authors present results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrate their importance in a variety of applications, such as linear algebra and matrix theory.
Journal ArticleDOI

Complex brain networks: graph theoretical analysis of structural and functional systems

TL;DR: This article reviews studies investigating complex brain networks in diverse experimental modalities and provides an accessible introduction to the basic principles of graph theory and highlights the technical challenges and key questions to be addressed by future developments in this rapidly moving field.
Book

An Introduction to Multivariate Statistical Analysis

TL;DR: In this article, the distribution of the Mean Vector and the Covariance Matrix and the Generalized T2-Statistic is analyzed. But the distribution is not shown to be independent of sets of Variates.
Journal ArticleDOI

Tensor Decompositions and Applications

TL;DR: This survey provides an overview of higher-order tensor decompositions, their applications, and available software.
Related Papers (5)
Frequently Asked Questions (8)
Q1. What are the contributions mentioned in the paper "Multimodal data fusion: an overview of methods, challenges and prospects" ?

Due to the rich characteristics of natural phenomena, it is rare that a single modality provides complete knowledge of the phenomenon of interest. The increasing availability of several modalities reporting on the same system introduces new degrees of freedom, which raise questions beyond those related to exploiting each modality separately. This paper deals with two key questions: “ why the authors need data fusion ” and “ how they perform it ”. The first question is motivated by numerous examples in science and technology, followed by a mathematical framework that showcases some of the benefits that data fusion provides. In order to address the second question, “ diversity ” is introduced as a key concept, and a number of datadriven solutions based on matrix and tensor decompositions are discussed, emphasizing how they account for diversity across the datasets. The aim of this paper is to provide the reader, regardless of his or her community of origin, with a taste of the vastness of the field, the prospects and opportunities that it holds. 

Other properties that are often used to achieve uniqueness, improve numerical robustness and enhance interpretability are, for example, non-negativity, sparsity, and smoothness [63]. 

The purpose of ongoing and planned sky surveys is to decrease the allowable uncertainty volume of the six-dimensional ΛCDM parameter space and to improve the constraints on the other cosmological parameters that depend on it [51]. 

While CMB corresponds to photons released about 300,000 years after the Big Bang, the same parameters that controlled the evolution of the early Universe continue to influence its matter distribution and expansion rate to their very days. 

In particular, (i) allowing more relaxed uniqueness conditions that admit more challenging scenarios: for example, more relaxed assumptions on the underlying factors, and the ability to resolve more latent variables (low-rank terms) in each dataset, and (ii) terms that are shared across datasets enjoy the same permutation at all datasets. 

Typical spatial resolutions of hyperspectral, multispectral and panchromatic images are tens of meters, a few meters and less than one meter, respectively. 

In both scenarios, the links themselves are new types of information: the fact that datasets are linked, that elements in different datasets are related (or not), and the nature of these interactions, bring new types of constraints into the system that allow to reduce the number of degrees of freedom and thus enhance uniqueness, performance, interpretability, and robustness, among others. 

Any type of constraint or assumption on the underlying variables that helps achieve essential uniqueness can be regarded as a “diversity”.