What other properties are often used to achieve uniqueness?

Other properties that are often used to achieve uniqueness, improve numerical robustness and enhance interpretability are, for example, non-negativity, sparsity, and smoothness [63].

How does the Big Bang affect the evolution of the universe?

While CMB corresponds to photons released about 300,000 years after the Big Bang, the same parameters that controlled the evolution of the early Universe continue to influence its matter distribution and expansion rate to their very days.

What are some of the benefits of allowing more relaxed uniqueness conditions?

In particular, (i) allowing more relaxed uniqueness conditions that admit more challenging scenarios: for example, more relaxed assumptions on the underlying factors, and the ability to resolve more latent variables (low-rank terms) in each dataset, and (ii) terms that are shared across datasets enjoy the same permutation at all datasets.

What are the main reasons for linking datasets?

In both scenarios, the links themselves are new types of information: the fact that datasets are linked, that elements in different datasets are related (or not), and the nature of these interactions, bring new types of constraints into the system that allow to reduce the number of degrees of freedom and thus enhance uniqueness, performance, interpretability, and robustness, among others.

What type of constraint or assumption helps achieve essential uniqueness?

Any type of constraint or assumption on the underlying variables that helps achieve essential uniqueness can be regarded as a “diversity”.

(Open Access) Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects (2015) | Dana Lahat

PROCEEDINGS OF THE IEEE, VOL. XX, NO. YY, MONTH 2015 1

Multimodal Data Fusion: An Overview

of Methods, Challenges and Prospects

Dana Lahat, T

ulay Adalı, Fellow, IEEE, and Christian Jutten, Fellow, IEEE

Abstract—In various disciplines, information about the same

phenomenon can be acquired from different types of detectors, at

different conditions, in multiple experiments or subjects, among

others. We use the term “modality” for each such acquisition

framework. Due to the rich characteristics of natural phenomena,

it is rare that a single modality provides complete knowledge

of the phenomenon of interest. The increasing availability of

several modalities reporting on the same system introduces new

degrees of freedom, which raise questions beyond those related to

exploiting each modality separately. As we argue, many of these

questions, or “challenges”, are common to multiple domains.

This paper deals with two key questions: “why we need data

fusion” and “how we perform it”. The ﬁrst question is motivated

by numerous examples in science and technology, followed by a

mathematical framework that showcases some of the beneﬁts that

data fusion provides. In order to address the second question,

“diversity” is introduced as a key concept, and a number of data-

driven solutions based on matrix and tensor decompositions are

discussed, emphasizing how they account for diversity across the

datasets. The aim of this paper is to provide the reader, regardless

of his or her community of origin, with a taste of the vastness

of the ﬁeld, the prospects and opportunities that it holds.

Index Terms—Keywords: data fusion, multimodality, multiset

data analysis, latent variables, tensor, overview.

I. INTRODUCTION

Information about a phenomenon or a system of interest can

be obtained from different types of instruments, measurement

techniques, experimental setups, and other types of sources.

Due to the rich characteristics of natural processes and envi-

ronments, it is rare that a single acquisition method provides

complete understanding thereof. The increasing availability of

multiple datasets that contain information, obtained using dif-

ferent acquisition methods, about the same system, introduces

new degrees of freedom that raise questions beyond those

related to analysing each dataset separately.

The foundations of modern data fusion have been laid in the

ﬁrst half of the 20th century [1], [2]. Joint analysis of multiple

datasets has since been the topic of extensive research, and

earned a signiﬁcant leap forward in the late 1960’s–early

1970’s with the formulation of concepts and techniques such

as multi-set canonical correlation analysis (CCA) [3], parallel

factor analysis (PARAFAC) [4], [5], and other tensor decom-

positions [6], [7]. However, until rather recently, in most cases,

D. Lahat and Ch. Jutten are with GIPSA-Lab, UMR CNRS 5216, Grenoble

Campus, BP46, F-38402 Saint Martin d’H

eres, France. T. Adalı is with the

Department of CSEE, University of Maryland, Baltimore County, Baltimore,

MD 21250, USA. email:{Dana.Lahat, Christian.Jutten}@gipsa-lab.grenoble-

inp.fr, adali@umbc.edu. This work is supported by the project CHESS, 2012-

ERC-AdG-320684 (D. Lahat and Ch. Jutten) and by the grants NSF-IIS

1017718 and NSF-CCF 1117056 (T. Adalı). GIPSA-Lab is a partner of the

LabEx PERSYVAL-Lab (ANR–11-LABX-0025).

these data fusion methodologies were conﬁned within the

limits of psychometrics and chemometrics, the communities

in which they evolved. With recent technological advances, in

a growing number of domains, the availability of datasets that

correspond to the same phenomenon has increased, leading to

increased interest in exploiting them efﬁciently. Many of the

providers of multi-view, multirelational, and multimodal data

are associated with high-impact commercial, social, biomed-

ical, environmental, and military applications, and thus the

drive to develop new and efﬁcient analytical methodologies is

high and reaches far beyond pure academic interest.

Motivations for data fusion are numerous. They include

obtaining a more uniﬁed picture and global view of the system

at hand; improving decision making; exploratory research; an-

swering speciﬁc questions about the system, such as identify-

ing common vs. distinctive elements across modalities or time;

and in general, extracting knowledge from data for various

purposes. However, despite the evident potential beneﬁt, and

massive work that has already been done in the ﬁeld (see, for

example, [8]–[16] and references therein), the knowledge of

how to actually exploit the additional diversity that multiple

datasets offer is still at its very preliminary stages.

Data fusion is a challenging task for several reasons [8]–

[11], [17]–[19]. First, the data are generated by very complex

systems: biological, environmental, sociological, and psycho-

logical, to name a few, driven by numerous underlying pro-

cesses that depend on a large number of variables to which

we have no access. Second, due to the augmented diversity,

the number, type and scope of new research questions that can

be posed is potentially very large. Third, working with hetero-

geneous datasets such that the respective advantages of each

dataset are maximally exploited, and drawbacks suppressed,

is not an evident task. We elaborate on these matters in the

following sections. Most of these questions have been devised

only in the very recent years, and, as we show in the sequel,

only a fraction of their potential has already been exploited.

Hence, we refer to them as “challenges”.

A rather wide perspective on challenges in data fusion is

presented by [8], which discusses linked-mode decomposition

models within the framework of chemometrics and psycho-

metrics, and [9], which focusses on “automated decision

making” with special attention to multisensor information

fusion. In practice, however, challenges in data fusion are most

often brought up within a framework dedicated to a speciﬁc

application, model and dataset; examples will be given in the

sections that follow.

In this paper, we bring together a comprehensive (but

deﬁnitely not exhaustive) list of challenges in data fusion.

PROCEEDINGS OF THE IEEE, VOL. XX, NO. YY, MONTH 2015 2

Following from [8], [9], [16], [19] (and others), and further

emphasized by our discussion in this paper, it is clear that

at the appropriate level of abstraction, the same challenge in

data fusion can be relevant to completely different and diverse

applications, goals and data types. Consequently, a solution to

a challenge that is based on a sufﬁciently data-driven, model-

free approach may turn out to be useful in very different

domains. Therefore, there is an obvious interest in opening

up the discussion of data fusion challenges to include and

involve disparate communities, so that each community could

inform the others. Our goal is to stimulate and emphasize the

relevance and importance of a perspective based on challenges

to advanced data fusion. More speciﬁcally, we would like

to promote data-driven approaches, that is, approaches with

minimal and weak priors and constraints, such as sparsity,

non-negativity, low-rank and independence, among others,

that can be useful to more than one speciﬁc application or

dataset. Hence, we present these challenges in quite a general

framework that is not speciﬁc to an application, goal or data

type. We also give examples and motivations from different

domains.

In order to contain our discussion, we focus on setups in

which a phenomenon or a system is observed using multiple

instruments, measurement devices or acquisition techniques. In

this case, each acquisition framework is denoted as a modality

and is associated with one dataset. The whole setup, in which

one has access to data obtained from multiple modalities, is

known as multimodal. A key property of multimodality is

complementarity, in the sense that each modality brings to

the whole some type of added value that cannot be deduced

or obtained from any of the other modalities in the setup.

In mathematical terms, this added value is known as diversity.

Diversity allows to reduce the number of degrees of freedom in

the system by providing constraints that enhance uniqueness,

interpretability, robustness, performance, and other desired

properties, as will be illustrated in the rest of this paper.

Diversity can be found in a broad range of scenarios and plays

a key role in a wide scope of mathematical and engineering

studies. Accordingly, we suggest the following operative def-

inition for the special type of diversity that is associated with

multimodality:

Deﬁnition I.1: Diversity (due to multimodality) is the

property that allows to enhance the uses, beneﬁts and

insights (such as those discussed in Section II), in a way

that cannot be achieved with a single modality.

Diversity is the key to data fusion, as will be explained in

Section III. Furthermore, in Section III, we demonstrate how a

diversity approach to data fusion can provide a fresh new look

on previously well-known and well-founded data and signal

processing techniques.

As already noted, “data fusion” is quite a diffuse con-

cept that takes different interpretations with applications and

goals [8], [9], [20]. Therefore, within the context of this

paper, and in accordance with the types of problems on

which we focus, our emphasis is on the following tighter

interpretation [21]:

Deﬁnition I.2: Data fusion is the analysis of several

datasets such that different datasets can interact and

inform each other.

This concept will be given a more concrete meaning in

Sections III and V.

The goal of this paper is to provide some ideas, perspec-

tives, and guidelines as to how to approach data fusion. This

paper is not a review, not a literature survey, not a tutorial

nor a cookbook. As such, it does not propose or promote

any speciﬁc solution or method. On the contrary, our message

is that whatever speciﬁc method or approach is considered,

it should be kept in mind that it is just one among a very

large set, and should be critically judged as such. In the same

vein, any example in this paper should only be regarded as a

concretization of a much broader idea.

How to read this paper? In order to make this paper

accessible for readers with various interests and back-

grounds, it is organized in two types of cross-sections.

The ﬁrst part (Sections II–III) deals with the question

“why?”, i.e., why we need data fusion. The second part

(Sections IV–V) deals with the question “how?”, i.e, how

we perform data fusion. Each question is treated on two

levels: data (Sections II and IV), and theory (Sections III

and V). More speciﬁcally, Section II presents the concepts

of multimodality and data fusion, and motivates them

using examples from various applications. In Section III

we introduce the concept of diversity as a key to data

fusion, and give it a concrete mathematical formulation.

Section IV discusses complicating factors that should be

addressed in the actual processing of heterogeneous data.

Section V gives some guidelines as to how to actually

approach a data fusion problem from a model design

perspective. Section VI concludes our work.

II. WHAT IS MULTIMODALITY? WHY DO WE NEED

MULTIMODALITY?

For living creatures, multimodality is a very natural concept.

Living creatures use external and internal sensors, sometimes

denoted as “senses”, in order to detect and discriminate among

signals, communicate, cross-validate, disambiguate, and add

robustness to numerous life-and-death choices and responses

that must be taken rapidly, in a dynamic and constantly

changing internal and external environment.

The well-accepted paradigm that certain natural processes

and phenomena can express themselves under completely

different physical guises is the raison d’

etre of multimodal

data fusion. Too often, however, very little is known about the

underlying relationships among these modalities. Therefore,

the most obvious and essential endeavour to be undertaken

in any multimodal data analysis task is exploratory: to learn

about relationships between modalities, their complementarity,

PROCEEDINGS OF THE IEEE, VOL. XX, NO. YY, MONTH 2015 3

shared vs. modality-speciﬁc information content, and other

mutual properties.

In this section, we try to provide, by numerous practical

examples, a more concrete sense to what we mean when we

speak of “diversity” and “multimodality”. The examples below

illustrate the complementary nature of multimodal data, and

as a result, some of the prominent uses, beneﬁts and insights

that can be obtained from properly exploiting multimodal data,

especially as opposed to the analysis of single-set and single-

modal data. They also present various complicating factors,

due to which multimodal data fusion is not an evident task.

The purpose of this section is to show that multimodality is

already present in almost every ﬁeld of science and technology,

and thus it is of potential interest to everyone.

A. Multisensory Systems

Example II-A.1: Audio-Visual Multimodality. Audio-visual

multimodality is probably the most intuitive, since it uses

two of our most informative senses. Most human verbal com-

munication involves seeing the speaker [18]. Indeed, a large

number of audio-visual applications involve human speech and

vision. In such applications, it is usually the audio channel that

conveys the information of interest. It is well-known that audio

and video convey complementary information. Audio has the

advantage over video that it does not require line of sight.

On the other hand, the visual modality is resistant to various

factors that make audio and speech processing difﬁcult, such as

ambient noise, reverberations, and other acoustic disturbances.

Perhaps the most striking evidence to the amount of caution

that needs to be taken in the design and use of multimodal

systems is the “McGurk effect” [18]. In their seminal paper,

McGurk and McDonald [18] have shown that presenting

contradictory, or discrepant, speech [“ba”] and visual lip

movements [“ga”], can cause a human to perceive completely

different syllables [“da”]. These unexpected results have since

been the subject of ongoing exploratory research on human

perception and cognition [22, Section VI.A.5]. The McGurk

effect serves as an indication that in real-life scenarios, data

fusion can take paths much more intricate than simple sum-

mation of information. Not less important, it serves as a lesson

that fusing modalities can yield undesired results and severe

degradation of performance if the underlying relationships

between modalities are not properly understood.

Nowadays, audio-visual multimodality is used for a broad

range of applications [10], [23]. Examples include: speech

processing, including speech recognition, speech activity de-

tection, speech enhancement, speaker extraction and separa-

tion; scene analysis, for example tracking a speaker within

a group, biometrics and monitoring, for safety and security

applications [24]; human-machine interaction (HMI) [10];

calibration [25] [10, Section V.C]; and more.

Example II-A.2: Human-Machine Interaction. A domain

that is heavily inspired by natural multimodality is HMI. In

HMI, an important task is to design modalities that will make

HMI as natural, efﬁcient and intuitive as possible [11]. The

idea is to combine multiple interaction modes based on audio-

vision, touch, smell, movement (e.g., gesture detection and

user tracking), interpretation of human language commands,

and other multisensory functions [10], [11]. The principal

point that makes HMI stand out among other multimodal

applications that we mention is that, in HMI, the modalities

are often interactive (as their name implies). Unlike other

multimodal applications that we mention, not one but two

very different types of systems (human and machine) are “ob-

served” by each other’s sensors, and the goal of data fusion is

not only to interpret each system’s output, but also to actively

convey information between these two systems. An added

challenge is that this task should usually be accomplished

in real-time. An additional complicating factor that makes

multimodal HMI stand out is due to the fact that the human

user often plays an active part in the choice of modalities

(from the available set) and in the way that they are used

in practice. This implies that the design of the multimodal

setup and data fusion procedure must rely not only on the

theoretically and technologically optimal combination of data

streams but also on the ability to predict and adapt to the

subjective cognitive preferences of the individual user. We

refer to [11] (and references therein) for further discussion

of these aspects.

B. Biomedical, Health

Example II-B.1: Understanding Brain Functionality. Func-

tional brain study deals with understanding how the different

elements of the brain take part in various perceptual and

cognitive activities. Functional brain study largely relies on

non-invasive imaging techniques, whose purpose is to recon-

struct a high-resolution spatio-temporal image of the neuronal

activity within the brain. The neuronal activity within the brain

generates ionic currents that are often modelled as dipoles.

These dipoles induce electric and magnetic ﬁelds that can

be directly recorded by electroencephalography (EEG) and

magnetoencephalography (MEG), respectively. In addition,

neuronal activity induces changes in magnetization between

oxygen-rich and oxygen-poor blood, known as the haemody-

namic response. This effect, also called blood-oxygen-level

dependent (BOLD) changes, can be detected by functional

magnetic resonance imaging (fMRI). Therefore, fMRI is an

indirect measure of neuronal activity. These three modalities

dynamics. However, these techniques vary greatly in their

spatio-temporal resolutions: EEG and MEG data provide high

temporal [millisecond] resolution, whereas fMRI images have

low temporal [second] resolution. fMRI data are a set of

high-resolution 3D images, taken at regular time intervals,

representing the whole volume of the brain of a patient lying

in an fMRI scanner. EEG and MEG data are a set of time-

series signals reﬂecting voltage or neuromagnetic ﬁeld changes

recorded at each of the (usually a few dozen of) electrodes

attached to the scalp (EEG) or ﬁxed within an MEG scanner

helmet. The sensitivity of EEG and MEG to deep-brain signals

is limited. In addition, they have different selectivity to signals

as a function of brain morphology. Therefore, they provide

data at much poorer spatial resolution and do not have access

to the full brain volume. Consequently, the spatio-temporal in-

PROCEEDINGS OF THE IEEE, VOL. XX, NO. YY, MONTH 2015 4

formation provided by EEG, MEG and fMRI is highly comple-

mentary. Functional imaging techniques can be complemented

by other modalities that convey structural information. For

example, structural magnetic resonance imaging (sMRI) and

diffusion tensor imaging (DTI) report on the structure of the

brain in terms of gray matter, white matter and cerebrospinal

ﬂuid. sMRI is based on nuclear magnetic resonance of water

protons. DTI measures the diffusion process of molecules,

mainly water, and thus reports also on brain connectivity. Each

of these methods is based on different physical principles and

is thus sensitive to different types of properties within the

brain. In addition, each method has different pros and cons in

terms of safety, cost, accuracy, and other parameters. Recent

technological advances allow recording data from several

functional brain imaging techniques simultaneously [26], [27],

thus further motivating advanced data fusion.

It is a well-accepted paradigm in neuroscience that EEG

and fMRI carry complementary information about brain func-

tion [26], [28]. However, their very heterogeneous nature and

the fact that brain processes are very complicated systems that

depend on numerous latent phenomena imply that simultane-

ously extracting useful information from them is not an evident

task. The fact that there is no ground truth is reﬂected in the

very broad range of methods and approaches that are being

proposed [12], [15], [17], [21], [28]–[31]. Works on biomed-

ical brain imaging often emphasize the exploratory nature of

this task. Despite decades of study, the underlying relationship

between EEG and fMRI is far from being understood [17],

[29], [30], [32].

A well-known challenge in brain imaging is the EEG inverse

problem. A prevalent assumption is that the measured EEG

signal is generated by numerous current dipoles within the

brain, and the goal is to localise the origins of this neuronal

activity. Often formulated as a linear inverse problem, it is

ill-posed: many different spatial current patterns within the

skull can give rise to identical measurements [33]. In order

to make the problem well-conditioned, additional hypotheses

are required. A large number of solutions are based on

adding various priors to the EEG data [34]. Alternatively, an

identiﬁable and unique solution can be obtained using spatial

constraints from fMRI [12], [22], [30].

Example II-B.2: Medical Diagnosis. Various medical condi-

tions such as potentially malignant tumours cannot be diag-

nosed by a single type of measurement due to many factors

such as low sensitivity, low positive predictive values, low

speciﬁcity (high false-positive), a limited number of spatial

samples (as in biopsy), and other limitations of the various

assessment techniques. In order to improve the performance

of the diagnosis, risk assessment and therapy options, it is

necessary to perform numerous medical assessments based on

a broad range of medical diagnostic techniques [35], [36]. For

example, one can augment physical examination, blood-tests,

biopsies, static and functional magnetic resonance imaging,

with other parameters such as genetic, environmental and

personal risk factors. The question of how to analyse all these

simultaneously available resources is largely open. Currently,

this task relies mostly on human medical experts. One of the

main challenges is the automation of such decision procedures,

in order to improve correct interpretation, as well as save costs

and time [35].

Example II-B.3: Developing Non-Invasive Medical Diag-

nosis Techniques. In some cases, the use of multimodal

data fusion is only a ﬁrst step in the design of a single-

modal system. In [37], the challenge is understanding the

link between surface and intra-cardiac electrodes measuring

the same atrial ﬁbrillation event and the goal is eventually

extracting relevant atrial ﬁbrillation activity using only the

non-invasive modality. For this aim, the intra-cardiac modality

is exploited as a reference to guide the extraction of an atrial

electrical signal of interest from non-invasive electrocardiog-

raphy (ECG) recordings. The difﬁculty lies in the fact that the

intra-cardiac modality provides a rather pure signal whereas

the ECG signal is a mixture of the desired signal with other

sources, and the mixing model is unknown.

Example II-B.4: Smart Patient Monitoring. Health moni-

toring using multiple types of sensors is drawing increasing

attention from modern health services. The goal is to provide

a set of non-invasive, non-intrusive, reasonable-cost sensors

that allow the patient to run a normal life while providing

reliable warnings in real-time. Here, we focus on monitoring,

predicting and warning epileptic patients from potentially

dangerous seizures [38]. The gold standard in monitoring

epileptic seizures is combining EEG and video, where EEG

is manually analysed by experts and the whole diagnostic

procedure requires a stay of up to several days in a hospital

setting. This procedure is expensive, time consuming, and

physically inconvenient for the patient. Obviously, it is not

practical for daily life. While much effort has already been

dedicated to the prediction of epileptic seizures from EEG,

with no clear-cut results so far, a considerable proportion of

potentially lethal seizures are hardly detectable by EEG at

all. Therefore, a primary challenge is to understand the link

between epileptic seizures and additional body parameters:

movement, breathing, heart-rate, and others. Due to the fact

that epileptic seizures vary within and across patients, and due

to the complex relations between different body systems, it is

likely that any such system should rely on more than one

modality [38].

C. Environmental Studies

Example II-C.1: Remote Sensing and Earth Observations.

Various sensor technologies can report on different aspects

of objects on Earth. Passive optical hyperspectral (resp. mul-

tispectral) imaging technologies report on material content

of the surface by reconstructing its spectral characteristics

from hundreds of (resp. a few) narrow (resp. broad) adjacent

spectral bands within the visible range and beyond. A third

type of an optical sensor is panchromatic imaging, which

generates a monochromatic image with a much broader band.

Typical spatial resolutions of hyperspectral, multispectral and

panchromatic images are tens of meters, a few meters and

less than one meter, respectively. Hence, there exists a trade-

off between spectral and spatial resolution [39], [40] [13,

Chapter 9]. Topographic information can be acquired from

active sensors such as light detection and ranging (LiDAR)

PROCEEDINGS OF THE IEEE, VOL. XX, NO. YY, MONTH 2015 5

and synthetic aperture radar (SAR). LiDAR is based on a

narrow pulsed laser beam and thus provides highly accurate

information about distance to objects, i.e., altitude. SAR is

based on radio waves that illuminate a rather wide area,

and the backscattered components reaching the sensor are

registered; interpreting the reﬂections from the surface requires

some additional processing with respect to (w.r.t.) LiDAR.

Both technologies can provide information about elevation,

three-dimensional structure of the observed objects, and their

surface properties. LiDAR, being based on a laser beam,

generally reports on the structure of the surface, although it

can partially penetrate through certain areas such as forest

canopy, providing information on the internal structure of the

trees, for example. This ability is a mixed blessing, however,

since it generates reﬂections that have to be accounted for.

SAR and LiDAR use different electromagnetic frequencies

and thus interact differently with materials and surfaces. As

an example, depending on the wavelength, SAR may see the

canopy as a transparent object (waves reach the soil under

the canopy), semi-transparent (they penetrate in the canopy

and interact with it) or opaque (they are reﬂected by the top

of the canopy). Optical techniques are passive, which implies

that they rely on natural illumination. Active sensors such as

LiDAR and SAR can operate at night and in shaded areas [41].

Beyond the strengths and weaknesses of each technology

w.r.t. the others, the use of each is limited by a certain in-

herent ambiguity. For example, hyperspectral imaging cannot

distinguish between objects made of the same material that

are positioned at different elevations, such as concrete roofs

and roads. LiDAR cannot distinguish between objects with

the same elevation and surface roughness that are made of

different materials such as natural and artiﬁcial grass [42].

SAR images may sometimes be difﬁcult to interpret due to

their complex dependence on the geometry of the surface [41].

In real-life conditions, interpretability of the observations of

one modality may be difﬁcult without additional information.

For example, in hyperspectral imaging, on a ﬂat surface,

reﬂected light depends on the abundance (proportion of a ma-

terial in a pixel) and on the endmember (pure material present

in a pixel) reﬂectance. In a non-ﬂat surface, the reﬂected light

depends also on the topography, which may induce variations

in scene illumination and scattering. Therefore, in non-ﬂat

conditions, one cannot accurately extract material content

information from optical data alone. Adding a modality that

reports on the topography, such as LiDAR, is necessary to

resolve spectra accurately [43].

As an active initiative, we point out the yearly data fusion

contest of the IEEE Geoscience and Remote Sensing Society

(GRSS) (see dedicated paper in this issue [44]). Problems

addressed include multi-modal change detection, in which the

purpose is to detect changes in an area before and after an

event (a ﬂood, in this case), given SAR and multispectral

imaging [45], using either all or part of the modalities; multi-

modal multi-temporal data fusion of optical, SAR and LiDAR

images taken at different years over the same urban area [41],

where suggested applications include assessing urban density,

change detection and overcoming adverse illumination con-

ditions for optical sensors; and proposing new methods for

fusing hyperspectral and LiDAR data of the same area, e.g., for

improved classiﬁcation of objects [42].

Example II-C.2: Meteorological Monitoring. Accurate mea-

surements of atmospheric phenomena such as rain, water

vapour, dew, fog and snow are required for meteorological

analysis and forecasting, as well as for numerous applications

in hydrology, agriculture and aeronautical services. Data can

be acquired from various devices such as rain gauges, radars,

satellite-borne remote sensing devices (see Example II-C.1),

and recently also by exploiting existing commercial microwave

links [46]. Rain gauges, as an example, are simply cups that

collect the precipitation. Albeit the most direct and reliable

technique, their small sampling area implies very localized

representativeness and thus poor spatial resolution (e.g., [46],

[47]). Rain gauges may be read automatically at intervals as

short as seconds. Satellites observe Earth at different frequen-

cies, including visible, microwave, infrared, and shortwave

infrared to report on various atmospheric phenomena such

as water vapour content and temperature. The accuracy of

radar rainfall estimation may be affected by topography, beam

effects, distance from the radar, and other complicating factors.

Radars and satellite systems provide large spatial coverage;

however, they are less accurate in measuring precipitation

at ground level (e.g., [48]). Microwave links are deployed

by cellular providers for backhaul communication between

base stations. The signals transmitted by the base stations

are inﬂuenced by various atmospheric phenomena (e.g., [49]),

primarily attenuation due to rainfall [46], [47]. These changes

in signal strength are recorded at predeﬁned time intervals and

kept in the cellular provider’s logs. Hence, the precipitation

data is in fact a “reverse engineering” of this information. The

microwave links’ measurements provide average precipitation

on the entire link and close to ground level [46]. Altogether,

these technologies are largely complementary in their ability to

detect and distinguish between different meteorological phe-

nomena, spatial coverage, temporal resolution, measurement

error, and other properties. Therefore, meteorological data are

often combined for better accuracy, coverage and resolution;

see, e.g., [19], [47], [48] and references therein.

Example II-C.3: Cosmology. A major endeavour in astron-

omy and astrophysics is understanding the formation of our

Universe. Recent results include robust support for the six-

parameter standard model of cosmology, of a Universe domi-

nated by Cold Dark Matter and a cosmological constant Λ,

known as ΛCDM [50], [51]. The purpose of ongoing and

planned sky surveys is to decrease the allowable uncertainty

volume of the six-dimensional ΛCDM parameter space and to

improve the constraints on the other cosmological parameters

that depend on it [51]. The goal is to validate (or disprove)

the standard model.

A major difﬁculty in astrophysics and cosmology is the

absence of ground truth. This is because cosmological pro-

cesses involve very high energies, masses, large space and time

scales that make experimental study prohibitive. The lack of

ground truth and experimental support implied that, from its

very beginning, cosmological research had to rely on cross-

validation of outcomes of different observations, numerical

Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

Figures

Citations

Tensor Decomposition for Signal Processing and Machine Learning

Generalizing from a Few Examples: A Survey on Few-Shot Learning

Deep Multimodal Learning: A Survey on Recent Advances and Trends

Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects This paper provides an overview of the main challenges in multimodal data fusion acrossvariousdisciplinesandaddressestwokeyissues:''whyweneeddatafusion''and ''how we perform it.''

A Survey of Multi-View Representation Learning

References

Matrix Analysis

Complex brain networks: graph theoretical analysis of structural and functional systems

An Introduction to Multivariate Statistical Analysis

Tensor Decompositions and Applications

Matrix analysis: Frontmatter

Related Papers (5)

Multisensor data fusion: A review of the state-of-the-art

Relations Between Two Sets of Variates

Multimodal Machine Learning: A Survey and Taxonomy

Tensor Decompositions and Applications

Multimodal Deep Learning

Frequently Asked Questions (8)

Q1. What are the contributions mentioned in the paper "Multimodal data fusion: an overview of methods, challenges and prospects" ?

Q2. What other properties are often used to achieve uniqueness?

Q3. What is the purpose of ongoing and planned sky surveys?

Q4. How does the Big Bang affect the evolution of the universe?

Q5. What are some of the benefits of allowing more relaxed uniqueness conditions?

Q6. What are the typical spatial resolutions of hyperspectral, multispectral and panchromatic?

Q7. What are the main reasons for linking datasets?

Q8. What type of constraint or assumption helps achieve essential uniqueness?