What future works have the authors mentioned in the paper "Multisource and multitemporal data fusion in remote sensing" ?

In this context, several vibrant fusion topics, including pansharpening and resolution enhancement, point cloud data fusion, hyperspectral and LiDAR data fusion, multitemporal data fusion, as well as big data and social media were detailed and their corresponding challenges and possible future research directions were outlined and discussed. As demonstrated through the challenges and possible future research of each section, although the field of remote sensing data fusion is mature, there are still many doors left open for further investigation, both from the theoretical and application perspectives. The authors hope that this review opens up new possibilities for readers to further investigate the remaining challenges to developing sophisticated fusion approaches suitable for the applications at hand.

What have the authors stated for future works in "Multisource and multitemporal data fusion in remote sensing" ?

In this context, several vibrant fusion topics, including pansharpening and resolution enhancement, point cloud data fusion, hyperspectral and LiDAR data fusion, multitemporal data fusion, as well as big data and social media were detailed and their corresponding challenges and possible future research directions were outlined and discussed. As demonstrated through the challenges and possible future research of each section, although the field of remote sensing data fusion is mature, there are still many doors left open for further investigation, both from the theoretical and application perspectives. The authors hope that this review opens up new possibilities for readers to further investigate the remaining challenges to developing sophisticated fusion approaches suitable for the applications at hand.

What is the main challenge of the point cloud model for fusion with other data sources?

The main challenges of the point cloud model for fusion with other data sources is the unstructured three-dimensional spatial nature of P and that often no fixed spatial scale and accuracy exist across the dataset.

What is the main observation at the basis of these techniques?

The main observation at the basis of these techniques is that the available class labels can be propagated within the time-series to all the pixels that have not been changed between the considered acquisitions.

What is the definition of hyperspectral imaging?

Hyperspectral imaging often exhibits a nonlinear relation between the captured spectral information and the corresponding material.

What is the important challenge of combining remote sensing and social medial data?

To derive the value of big data, combining remote sensingand social medial data, one of the most important challenges is how to process and analyze those data by novel methods or methodologies.

What were the proposed transfer learning approaches?

Transfer learning approaches were proposed in [186]– [188], where change detection-based techniques were defined for propagating the labels of available data for a given image to the training sets of other images in the time-series.

Why is it important to organize benchmark datasets on a platform like the DASE website?

It is an urgent issue of the community to arrange benchmarkdatasets on a platform like the GRSS Data and Algorithm Standard Evaluation (DASE) website [102] so that everyone can fairly compete for the performance of the algorithm.

What are the characteristics of MRA-based pansharpening techniques?

MRA-based pan-sharpening techniques can be characterized by 1) the algorithm used for obtaining spatial details (e.g., spatial filtering or multiscale transform), and 2) the definition of the gain coefficients.

What is the main concept of MRA-based pansharpening?

Selva et al. (2015) proposed a general framework called hypersharpening that extends MRA-based pan-sharpening methods to multiband image fusion by creating a fine spatial resolution synthetic image for each coarse spatial resolution band as a linear combination of fine spatial resolution bands based on linear regression [74].

What is the cross-entropy loss of the CNN model?

Both the fully convolutional network (FCN) model [204] and the CNN model are constructed based on the pre-trained ImageNet VGG-16 network [205] with the cross-entropy loss.

How long can the Landsat sensor be able to revisit images?

The Landsat sensor can acquire images at a much finer spatial resolution of 30 m, but has a limited revisit capability of 16 days.

How did Liu and his team compare the HSI and LiDAR methods?

The joint pixel and object-based method increased the overall accuracy by 7.1% to 94.7%.HSI and airborne LiDAR data were used as complementary data sources for crown structure and physiological tree information by Liu et al. [127] to map 15 different urban tree species.

(Open Access) Multisource and Multitemporal Data Fusion in Remote Sensing: A Comprehensive Review of the State of the Art (2019) | Pedram Ghamisi

Q: What contributions have the authors mentioned in the paper "Multisource and multitemporal data fusion in remote sensing" ?

The final version of the paper can be found in IEEE Geoscience and Remote Sensing Magazine. This paper brings together the advances of multisource and multitemporal data fusion approaches with respect to different research communities and provides a thorough and discipline-specific starting point for researchers at different levels ( i. e., students, researchers, and The work of P. Ghamisi is supported by the ” High Potential Program ” of Helmholtz-Zentrum Dresden-Rossendorf. N. Yokoya is with the RIKEN Center for Advanced Intelligence Project, RIKEN, 103-0027 Tokyo, Japan ( e-mail: naoto. yokoya @ riken. jp ). More specifically, this paper provides a bird ’ s-eye view of many important contributions specifically dedicated to the topics of pansharpening and resolution enhancement, point cloud data fusion, hyperspectral and LiDAR data fusion, multitemporal data fusion, as well as big data and social media. Such an increase in remote sensing and ancillary datasets, however, opens up the possibility of utilizing multimodal datasets in a joint manner to further improve the performance of the processing approaches with respect to the application at hand.

Q: What are the contributions in "Multisource and multitemporal data fusion in remote sensing" ?

The final version of the paper can be found in IEEE Geoscience and Remote Sensing Magazine. This paper brings together the advances of multisource and multitemporal data fusion approaches with respect to different research communities and provides a thorough and discipline-specific starting point for researchers at different levels ( i. e., students, researchers, and The work of P. Ghamisi is supported by the ” High Potential Program ” of Helmholtz-Zentrum Dresden-Rossendorf. N. Yokoya is with the RIKEN Center for Advanced Intelligence Project, RIKEN, 103-0027 Tokyo, Japan ( e-mail: naoto. yokoya @ riken. jp ). More specifically, this paper provides a bird ’ s-eye view of many important contributions specifically dedicated to the topics of pansharpening and resolution enhancement, point cloud data fusion, hyperspectral and LiDAR data fusion, multitemporal data fusion, as well as big data and social media. Such an increase in remote sensing and ancillary datasets, however, opens up the possibility of utilizing multimodal datasets in a joint manner to further improve the performance of the processing approaches with respect to the application at hand.

Q: What is the main concept of MRA-based pansharpening methods?

The main concept of MRA-based pansharpening methods is to extract spatial details (or highfrequency components) from the PAN image and inject the details multiplied by gain coefficients into the multispectral data.

IEEE GRSM DRAFT 2018 1

Multisource and Multitemporal Data Fusion in

Remote Sensing

Pedram Ghamisi, Senior Member, IEEE, Behnood Rasti, Member, IEEE, Naoto Yokoya, Member, IEEE,

Qunming Wang, Bernhard H

oﬂe, Lorenzo Bruzzone, Fellow, IEEE, Francesca Bovolo, Senior Member, IEEE,

Mingmin Chi, Senior Member, IEEE, Katharina Anders, Richard Gloaguen,

Peter M. Atkinson, and J

on Atli Benediktsson, Fellow, IEEE

Abstract—The ﬁnal version of the paper can be found in IEEE

Geoscience and Remote Sensing Magazine.

The sharp and recent increase in the availability of data

captured by different sensors combined with their considerably

heterogeneous natures poses a serious challenge for the effective

and efﬁcient processing of remotely sensed data. Such an increase

in remote sensing and ancillary datasets, however, opens up the

possibility of utilizing multimodal datasets in a joint manner to

further improve the performance of the processing approaches

with respect to the application at hand. Multisource data fusion

has, therefore, received enormous attention from researchers

worldwide for a wide variety of applications. Moreover, thanks

to the revisit capability of several spaceborne sensors, the

integration of the temporal information with the spatial and/or

spectral/backscattering information of the remotely sensed data

is possible and helps to move from a representation of 2D/3D

data to 4D data structures, where the time variable adds new

information as well as challenges for the information extraction

algorithms. There are a huge number of research works dedicated

to multisource and multitemporal data fusion, but the methods

for the fusion of different modalities have expanded in different

paths according to each research community. This paper brings

together the advances of multisource and multitemporal data

fusion approaches with respect to different research communities

and provides a thorough and discipline-speciﬁc starting point

for researchers at different levels (i.e., students, researchers, and

The work of P. Ghamisi is supported by the ”High Potential Program” of

Helmholtz-Zentrum Dresden-Rossendorf.

P. Ghamisi and R. Gloaguen are with the Helmholtz-Zentrum

Dresden-Rossendorf (HZDR), Helmholtz Institute Freiberg for Resource

Technology (HIF), Exploration, D-09599 Freiberg, Germany (emails:

p.ghamisi@gmail.com,r.gloaguen@hzdr.de).

B. Rasti is with the Faculty of Electrical and Computer Engineering,

University of Iceland, 107 Reykjavik, Iceland (email: behnood@hi.is).

N. Yokoya is with the RIKEN Center for Advanced Intelligence Project,

RIKEN, 103-0027 Tokyo, Japan (e-mail: naoto.yokoya@riken.jp).

Q. Wang is with the College of Surveying and Geo-Informatics,

Tongji University, 1239 Siping Road, Shanghai 200092, China (email:

wqm11111@126.com).

B. H

oﬂe and K. Anders are with GIScience at the Institute of Geog-

raphy, Heidelberg University, Germany (emails: hoeﬂe@uni-heidelberg.de,

katharina.anders@uni-heidelberg.de).

L. Bruzzone is with the department of Information Engineering

and Computer Science, University of Trento, Trento, Italy (email:

lorenzo.bruzzone@unitn.it).

F. Bovolo is with the Center for Information and Communication Technol-

ogy, Fondazione Bruno Kessler, Trento, Italy (email: bovolo@fbk.eu).

M. Chi is with the school of Computer Science, Fudan University, China

(email: mmchi@fudan.edu.cn).

P. M. Atkinson is with Lancaster Environment Centre, Lancaster University,

Lancaster, U.K (email: pma@lancaster.ac.uk).

J. A. Benediktsson is with the Faculty of Electrical and Computer Engineer-

ing, University of Iceland, 107 Reykjavik, Iceland (e-mail: benedikt@hi.is).

Manuscript received 2018.

senior researchers) willing to conduct novel investigations on this

challenging topic by supplying sufﬁcient detail and references.

More speciﬁcally, this paper provides a bird’s-eye view of many

important contributions speciﬁcally dedicated to the topics of

pansharpening and resolution enhancement, point cloud data

fusion, hyperspectral and LiDAR data fusion, multitemporal data

fusion, as well as big data and social media. In addition, the

main challenges and possible future research for each section

are outlined and discussed.

Index Terms—Fusion; Multisensor Fusion; Multitemporal Fu-

sion; Downscaling; Pansharpening; Resolution Enhancement;

Spatio-Temporal Fusion; Spatio-Spectral Fusion; Component

Substitution; Multiresolution Analysis; Subspace Representation;

Geostatistical Analysis; Low-Rank Models; Filtering; Composite

Kernels; Deep Learning.

I. INTRODUCTION

The number of data produced by sensing devices has

increased exponentially in the last few decades, creating the

“Big Data” phenomenon, and leading to the creation of the

new ﬁeld of “data science”, including the popularization of

“machine learning” and “deep learning” algorithms to deal

with such data [1]–[3]. In the ﬁeld of remote sensing, the

number of platforms for producing remotely sensed data has

similarly increased, with an ever-growing number of satellites

in orbit and planned for launch, and new platforms for

proximate sensing such as unmanned aerial vehicles (UAVs)

producing very ﬁne spatial resolution data. While optical

sensing capabilities have increased in quality and volume,

the number of alternative modes of measurement has also

grown including, most notably, airborne light detection and

ranging (LiDAR) and terrestrial laser scanning (TLS), which

produce point clouds representing elevation, as opposed to

images [4]. The number of synthetic aperture radar (SAR)

sensors, which measure RADAR backscatter, and satellite and

airborne hyperspectral sensors, which extend optical sensing

capabilities by measuring in a larger number of wavebands,

has also increased greatly [5], [6]. Airborne and spaceborne

geophysical measurements such as the satellite mission Grav-

ity Recovery And Climate Experiment (GRACE) or airborne

electro-magnetic surveys are currently been also considered.

In addition, there has been great interest in new sources of

ancillary data, for example, from social media, crowd sourcing,

scraping the internet and so on ([7]–[9]). These data have a

very different modality to remote sensing data, but may be

arXiv:1812.08287v1 [cs.LG] 19 Dec 2018

IEEE GRSM DRAFT 2018 2

related to the subject of interest and, therefore, may add useful

information relevant to speciﬁc problems.

The remote sensors onboard the above platforms may vary

greatly in multiple dimensions; for example, the types of

properties sensed and the spatial and spectral resolutions of

the data. This is true, even for sensors that are housed on the

same platform (e.g., the many examples of multispectral and

panchromatic sensors) or that are part of the same satellite

conﬁguration (e.g., the European Space Agency’s (ESA’s)

series of Medium Resolution Imaging Spectrometer (MERIS)

sensors). The rapid increase in the number and availability of

data combined with their deeply heterogeneous natures creates

serious challenges for their effective and efﬁcient processing

([10]). For a particular remote sensing application, there are

likely to be multiple remote sensing and ancillary datasets

pertaining to the problem and this creates a dilemma; how

best to combine the datasets for maximum utility? It is for this

reason that multisource data fusion, in the context of remote

sensing, has received so much attention in recent years [10]–

[13].

Fortunately, the above increase in the number and het-

erogeneity of data sources (presenting both challenge and

opportunity) has been paralleled by increases in computing

power, by efforts to make data more open, available and

interoperable, and by advances in methods for data fusion,

which are reviewed here [15]. There exist a very wide range

of approaches to data fusion (e.g., [11]–[13]). This paper

seeks to review them by class of data modality (e.g., optical,

SAR, laser scanning) because methods for these modalities

have developed somewhat differently, according to each re-

search community. Given this diversity, it is challenging to

synthesize multisource data fusion approaches into a single

framework, and that is not the goal here. Nevertheless, a

general framework for measurement and sampling processes

(i.e., forward processes) is now described brieﬂy to provide

greater illumination of the various data fusion approaches

(i.e., commonly inverse processes or with elements of inverse

processing) that are reviewed in the following sections. Due to

the fact that the topic of multisensor data fusion is extremely

broad and that speciﬁc aspects have been reviewed already

we have to restrict what is covered in the manuscript and,

therefore, do not address a few topics such as the fusion of

SAR and optical data.

We start by deﬁning the space and properties of interest.

In remote sensing, there have historically been considered to

be four dimensions in which information is provided. These

are: spatial, temporal, spectral, and radiometric; that is, 2D

spatially, 1D temporally, and 1D spectrally with “radiometric”

referring to numerical precision. The electromagnetic spectrum

(EMS) exists as a continuum and, thus, lends itself to high-

dimensional feature space exploration through deﬁnition of

multiple wavebands (spectral dimension). LiDAR and TLS,

in contrast to most optical and SAR sensors, measure a

surface in 3D spatially. Recent developments in photo- and

radargrammetry such as Structure from Motion (SfM) and

InSAR, have increased the availability of 3D data. This

expansion of the dimensionality of interest to 3D in space

and 1D in time makes image and data fusion additionally

challenging [4]. The properties measured in each case vary,

with SAR measuring backscatter, optical sensors (including

hyperspectral) measuring the visible and infrared parts of

the EMS, and laser scanners measuring surface elevation in

3D. Only surface elevation is likely to be a primary interest,

whereas reﬂectance and backscatter are likely to be only

indirectly related to the property of interest.

Secondly, we deﬁne measurement processes. A common

“physical model” in remote sensing is one of four component

models: scene model, atmosphere model, sensor model, and

image model [16]–[21]. The scene model deﬁnes the subject

of interest (e.g., land cover, topographic surface), while the

atmosphere model is a transform of the EMS from surface

to sensor, the sensor model represents a measurement process

(e.g., involving a signal-to-noise ratio, the point spread func-

tion) and the image model is a sampling process (e.g., to create

the data as an image of pixels on a regular grid).

Third, the sampling process implied by the image model

above can be expanded and generalized to three key pa-

rameters (the sampling extent, the sampling scheme, and the

sampling support), each of which has four further parameters

(size, geometry, orientation, and position). The support is a

key sampling parameter which deﬁnes the space on which

each observation is made; it is most directly related to the

point spread function in remote sensing, and is represented

as an image pixel [22]. The combination and arrangement of

pixels as an image deﬁnes the spatial resolution of the image.

Fusion approaches are often concerned with the combination

of two or more datasets with different spatial resolutions such

as to create a uniﬁed dataset at the ﬁnest resolution [23]–[25].

Fig. 1(a) demonstrates schematically the multiscale nature

(different spatial resolutions) of diverse datasets captured by

spaceborne, airborne, and UAV sensors. In principle, there is

a relation between spatial resolution and scene coverage, i.e.,

data with a coarser spatial resolution (spaceborne data) have a

larger scene coverage while data with a ﬁner spatial resolution

have a limited coverage (UAV data).

All data fusion methods attempt to overcome the above

measurement and sampling processes, which fundamentally

limit the amount of information transferring from the scene to

any one particular dataset. Indeed, in most cases of data fusion

in remote sensing the different datasets to be fused derive in

different ways from the same scene model, at least as deﬁned

in a speciﬁc space-time dimension and with speciﬁc measur-

able properties (e.g., land cover objects, topographic surface).

Understanding these measurement and sampling processes is,

therefore, key to characterizing methods of data fusion since

each operates on different parts of the sequence from scene

model to data. For example, it is equally possible to perform

the data fusion process in the scene space (e.g., via some data

generating model such as a geometric model) as in the data

space (the more common approach) [21].

Finally, we deﬁne the “statistical model” framework as

including: (i) measurement to provide data, as described above,

(ii) characterization of the data through model ﬁtting, (iii)

prediction of unobserved data given (ii), and (iv) forecasting

[26]. (i), (ii), and (iii) are deﬁned in space or space-time, while

(iv) extends through time beyond the range of the current data.

IEEE GRSM DRAFT 2018 3

(a)

(c)

(b)

(d)

200620052004200320022001

RGBUrban

Fig. 1: (a) The multiscale nature of diverse datasets captured by multisensor data (spaceborne, airborne, and UAV sensors) in

Nambia [14]; (b) The trade-off between spectral and spatial resolutions; (c) Elevation information obtained by LiDAR sensors

from the University of Houston; (d) Time-series data analysis for assessing the dynamic of changes using RGB and urban

images captured from 2001 to 2006 in Dubai.

Prediction (iii) can be of the measured property x (e.g., re-

ﬂectance or topographic elevation, through interpolation) or it

can be of a property of interest y to which the measured x data

are related (e.g., land cover or vegetation biomass, through

classiﬁcation or regression-type approaches). Similarly, data

fusion can be undertaken on x or it can be applied to predict

y from x. Generally, therefore, data fusion is applied either

between (ii) and (iii) (e.g., fusion of x based on the model in

(ii)), as part of prediction (e.g., fusion to predict y) or after

prediction of certain variables (e.g., ensemble uniﬁcation). In

this paper, the focus is on data fusion to predict x.

Data fusion is made possible because each dataset to be

fused represents a different view of the same real world deﬁned

in space and time (generalized by the scene model), with

each view having its own measurable properties, measurement

processes, and sampling processes. Therefore, crucially, one

should expect some level of coherence between the real world

(the source) and the multiple datasets (the observations), as

well as between the datasets themselves, and this is the basis

of most data fusion methods. This concept of coherence is

central to data fusion [27].

Attempts to fuse datasets are potentially aided by knowledge

of the structure of the real world. The real world is spatially

correlated, at least at some scale [28] and this phenomenon

has been used in many algorithms (e.g., geostatistical models

[27]). Moreover, the real world is often comprised of func-

tional objects (e.g., residential houses, roads) that have expec-

tations around their sizes and shapes, and such expectations

can aid in deﬁning objective functions (i.e., in optimization

solutions) [29]. These sources of prior information (on real

world structure) constrain the space of possible fusion solu-

tions beyond the data themselves.

Many key application domains stand to beneﬁt from data fu-

sion processing. For example, there exists a very large number

of applications where an increase in spatial resolution would

add utility, which is the center of focus in Section II of this pa-

per. These include land cover classiﬁcation, urban-rural deﬁni-

tion, target identiﬁcation, geological mapping, and so on (e.g.,

[30]). A large focus of attention currently is on the speciﬁc

problem that arises from the trade-off in remote sensing be-

tween spatial resolution and temporal frequency; in particular

the fusion of coarse-spatial-ﬁne-temporal-resolution with ﬁne-

spatial-coarse-temporal-resolution space-time datasets such as

to provide frequent data with ﬁne spatial resolution [31]–[34],

which will be detailed in Section II and V of this paper. Land

cover classiﬁcation is one of the most vibrant ﬁelds of research

in the remote sensing community [35], [36], which attempts

to differentiate between several land cover classes available in

the scene, can substantially beneﬁt from data fusion. Another

example is the trade-off between spatial resolution and spectral

resolution (Fig. 1(b)) to produce ﬁne-spectral-spatial resolution

images, which plays an important role for land cover classiﬁca-

tion and geological mapping. As can be seen in Fig. 1(b), both

ﬁne spectral and spatial resolutions are required to provide

detailed spectral information and avoid the “mixed-pixel”

phenomenon at the same time. Further information about

this topic can be found in Section II. Elevation information

provided by LiDAR and TLS (see Fig. 1(c)) can be used

in addition to optical data to further increase classiﬁcation

and mapping accuracy, in particular for classes of objects,

which are made up of the same materials (e.g., grassland,

shrubs, and trees). Therefore, Sections III and IV of this paper

are dedicated to the topic of elevation data fusion and their

integration with passive data. Furthermore, new sources of

ancillary data obtained from social media, crowd sourcing,

and scraping the internet can be used as additional sources

of information together with airborne and spaceborne data

for smart city and smart environment applications as well as

IEEE GRSM DRAFT 2018 4

hazard monitoring and identiﬁcation. This young, yet active,

ﬁeld of research is the focus of Section VI.

Many applications can beneﬁt from fused ﬁne-resolution,

time-series datasets, particularly those that involve seasonal

or rapid changes, which will be elaborated in Section V.

Fig. 1(d) shows the dynamic of changes for an area in Dubai

from 2001 to 2006 using time-series of RGB and urban

images. For example, monitoring of vegetation phenology (the

seasonal growing pattern of plants) is crucial to monitoring

deforestation [37] and crop yield forecasting, which mitigates

against food insecurity globally, natural hazards (e.g. earth-

quakes, landslides) or illegal activities such as pollutions (e.g.

oil spills, chemical leakages). However, such information is

provided globally only at very coarse resolution, meaning that

local smallholder farmers cannot beneﬁt from such knowledge.

Data fusion can be used to provide frequent data needed for

phenology monitoring, but at a ﬁne spatial resolution that

is relevant to local farmers [38]. Similar arguments can be

applied to deforestation where frequent, ﬁne resolution data

may aid in speeding up the timing of government interventions

[37], [39]. The case for fused data is arguably even greater

for rapid change events; for example, forest ﬁres and ﬂoods.

In these circumstances, the argument for frequent updates at

ﬁne resolution is obvious. While these application domains

provide compelling arguments for data fusion, there exist

many challenges including: (i) the data volumes produced at

coarse resolution via sensors such as MODIS and MERIS

are already vast, meaning that fusion of datasets most likely

needs to be undertaken on a case-by-case basis as an on-

demand service and (ii) rapid change events require ultra-fast

processing meaning that speed may outweigh accuracy in such

cases [40]. In summary, data fusion approaches in remote

sensing vary greatly depending on the many considerations

described above, including the sources of the datasets to

be fused. In the following sections, we review data fusion

approaches in remote sensing according to the data sources to

be fused only, but the further considerations introduced above

are relevant in each section.

The remainder of this review is divided into the following

sections. First, we review pansharpening and resolution en-

hancement approaches in Section II. Then, we will move on

by discussing point cloud data fusion in Section III. Section

IV is devoted to hyperspectral and LiDAR data fusion. Section

V presents an overview of multitemporal data fusion. Major

recent advances in big data and social media fusion are pre-

sented in Section IV. Finally, Section VII draws conclusions.

II. PANSHARPENING AND RESOLUTION ENHANCEMENT

Optical Earth observation satellites have trade-offs in spa-

tial, spectral, and temporal resolutions. Enormous efforts have

been made to develop data fusion techniques for reconstructing

synthetic data that have the advantages of different sensors.

Depending on which pair of resolutions has a tradeoff, these

technologies can be divided into two categories: (1) spatio-

spectral fusion to merge ﬁne-spatial and ﬁne-spectral reso-

lutions [see Fig. 2(a)]; (2) spatio-temporal fusion to blend

ﬁne-spatial and ﬁne-temporal resolutions [see Fig. 2(b)]. This

Fig. 2: Schematic illustrations of (a) spatio-spectral fusion and

(b) spatio-temporal fusion.

section provides overviews of these technologies with recent

advances.

A. Spatio-spectral fusion

Satellite sensors such as WorldView and Landsat ETM+ can

observe the Earth’s surface at different spatial resolutions in

different wavelengths. For example, the spatial resolution of

the eight-band WorldView multispectral image is 2 m, but the

single band panchromatic (PAN) image has a spatial resolution

of 0.5 m. Spatio-spectral fusion is a technique to fuse the ﬁne

spatial resolution images (e.g., 0.5 m WorldView PAN image)

with coarse spatial resolution images (e.g., 2 m WorldView

multispectral image) to create ﬁne spatial resolution images for

all bands. Spatio-spectral fusion is also termed pan-sharpening

when the available ﬁne spatial resolution image is a single

PAN image. When multiple ﬁne spatial resolution bands are

available, spatio-spectral fusion is referred to as multiband im-

age fusion, where two optical images with a trade-off between

spatial and spectral resolutions are fused to reconstruct ﬁne-

spatial and ﬁne-spectral resolution imagery. Multiband image

fusion tasks include multiresolution image fusion of single-

satellite multispectral data (e.g., MODIS and Sentinel-2) and

hyperspectral and multispectral data fusion [41].

IEEE GRSM DRAFT 2018 5

MRA

Geostatistical

Subspace

Sparse

Fig. 3: The history of the representative literature of ﬁve

approaches in spatio-spectral fusion. The size of each cir-

cle is proportional to the annual average number of ci-

tations. For each category, from left to right, circles cor-

respond to [42]–[50] for CS, [51]–[57] for MRA, [58]–

[61], [27], [62], [31], [63] for Geostatistical, [64]–[69] for

Subspace, and [70]–[72] for Sparse.

Over the past decades, spatio-spectral fusion has motivated

considerable research in the remote sensing community. Most

spatio-spectral fusion techniques can be categorized into at

least one of ﬁve approaches: 1) component substitution (CS),

2) multiresolution analysis (MRA), 3) geostatistical analysis,

4) subspace representation, and 5) sparse representation. Fig. 3

shows the history of representative literature with different col-

ors (or rows) representing different categories of techniques.

The size of each circle is proportional to the annual average

number of citations (obtained by Google Scholar on January

20, 2018), which indicates the impact of each approach in the

ﬁeld. The main concept and characteristics of each category

are described below.

1) Component Substitution: CS-based pan-sharpening

methods spectrally transform the multispectral data into an-

other feature space to separate spatial and spectral information

into different components. Typical transformation techniques

include intensity-hue-saturation (IHS) [44], principal compo-

nent analysis (PCA) [43], and Gram-Schmidt [46] transfor-

mations. Next, the component that is supposed to contain the

spatial information of the multispectral image is substituted

by the PAN image after adjusting the intensity range of

the PAN image to that of the component using histogram

matching. Finally, the inverse transformation is performed on

the modiﬁed data to obtain the sharpened image.

Aiazzi et al. (2007) proposed the general CS-based pan-

sharpening framework, where various methods based on dif-

ferent transformation techniques can be explained in a uniﬁed

way [48]. In this framework, each multispectral band is

sharpened by injecting spatial details obtained as the differ-

ence between the PAN image and a coarse-spatial-resolution

synthetic component multiplied by a band-wise modulation

coefﬁcient. By creating the synthetic component based on

linear regression between the PAN image and the multispectral

image, the performances of traditional CS-based techniques

were greatly increased, mitigating spectral distortion.

CS-based fusion techniques have been used widely owing

to the following advantages: i) high ﬁdelity of spatial details

in the output, ii) low computational complexity, and iii)

robustness against misregistration. On the other hand, the

CS methods suffer from global spectral distortions when the

overlap of spectral response functions (SRFs) between the two

sensors is limited.

2) Multiresolution Analysis: As shown in Fig. 3, great

effort has been devoted to the study of MRA-based pan-

sharpening algorithms particularly between 2000 and 2010

and they have been used widely as benchmark methods for

more than ten years. The main concept of MRA-based pan-

sharpening methods is to extract spatial details (or high-

frequency components) from the PAN image and inject the

details multiplied by gain coefﬁcients into the multispectral

data. MRA-based pan-sharpening techniques can be charac-

terized by 1) the algorithm used for obtaining spatial details

(e.g., spatial ﬁltering or multiscale transform), and 2) the

deﬁnition of the gain coefﬁcients. Representative MRA-based

fusion techniques are based on box ﬁltering [54], Gaussian

ﬁltering [56], bilateral ﬁltering [73], wavelet transform [53],

[55], and curvelet transform [57]. The gain coefﬁcients can be

computed either locally or globally.

Selva et al. (2015) proposed a general framework called hy-

persharpening that extends MRA-based pan-sharpening meth-

ods to multiband image fusion by creating a ﬁne spatial

resolution synthetic image for each coarse spatial resolution

band as a linear combination of ﬁne spatial resolution bands

based on linear regression [74].

The main advantage of the MRA-based fusion techniques is

its spectral consistency. In other words, if the fused image is

degraded in the spatial domain, a degraded image is spectrally

consistent with the input coarse-spatial and ﬁne-spectral reso-

lution image. The main shortcoming is that its computational

complexity is greater than that of CS-based techniques.

3) Geostatistical Analysis: Geostatistical solutions provide

another family of approaches for spatio-spectral fusion. This

type of approach can preserve the spectral properties of the

original coarse images. That is, when the downscaled predic-

tion is upscaled to the original coarse spatial resolution, the

result is identical to the original one (i.e., perfect coherence).

Pardo-Iguzquiza et al. [58] developed a downscaling cokriging

(DSCK) method to fuse the Landsat ETM+ multispectral

images with the PAN image. DSCK treats each multispectral

image as the primary variable and the PAN image as the

secondary variable. DSCK was extended with a spatially

adaptive ﬁltering scheme [60], in which the cokriging weights

are determined on a pixel basis, rather than being ﬁxed in

the original DSCK. Atkinson et al. [59] extended DSCK to

downscaled the multispectral bands to a spatial resolution ﬁner

than any input images, including the PAN image. DSCK is

a one-step method, and it involves auto-semivariogram and

cross-semivariogram modeling for each coarse band [61].

Sales et al. [61] developed a kriging with external drift

(KED) method to fuse 250 m Moderate Resolution Imaging

Spectroradiometer (MODIS) bands 1-2 with 500 m bands

3-7. KED requires only auto-semivariogram modeling for

the observed coarse band and simpliﬁes the semivariogram

modeling procedure, which makes it easier to implement

than DSCK. As admitted in Sales et al. [61], however, KED

suffers from expensive computational cost, as it computes

Multisource and Multitemporal Data Fusion in Remote Sensing: A Comprehensive Review of the State of the Art

Figures

Citations

A New Benchmark Based on Recent Advances in Multispectral Pansharpening: Revisiting Pansharpening With Classical and Emerging Pansharpening Methods

Classification of Hyperspectral and LiDAR Data Using Coupled CNNs

UAV & satellite synergies for optical remote sensing applications: A literature review

Classification of Hyperspectral and LiDAR Data Using Coupled CNNs

Deep learning-based remote and social sensing data fusion for urban region function recognition

References

Very Deep Convolutional Networks for Large-Scale Image Recognition

Fully Convolutional Networks for Semantic Segmentation

A Computer Movie Simulating Urban Growth in the Detroit Region

Spark: cluster computing with working sets

Review Article Digital change detection techniques using remotely-sensed data

Related Papers (5)

A Critical Comparison Among Pansharpening Algorithms

Multisensor image fusion in remote sensing: Concepts, methods and applications

On the blending of the Landsat and MODIS surface reflectance: predicting daily Landsat surface reflectance

An enhanced spatial and temporal adaptive reflectance fusion model for complex heterogeneous regions

Image quality assessment: from error visibility to structural similarity

Frequently Asked Questions (16)

Q1. What contributions have the authors mentioned in the paper "Multisource and multitemporal data fusion in remote sensing" ?

Q2. What future works have the authors mentioned in the paper "Multisource and multitemporal data fusion in remote sensing" ?

Q3. What are the contributions in "Multisource and multitemporal data fusion in remote sensing" ?

Q4. What have the authors stated for future works in "Multisource and multitemporal data fusion in remote sensing" ?

Q5. What is the main concept of MRA-based pansharpening methods?

Q6. What is the main challenge of the point cloud model for fusion with other data sources?

Q7. What is the main observation at the basis of these techniques?

Q8. What is the definition of hyperspectral imaging?

Q9. What is the important challenge of combining remote sensing and social medial data?

Q10. What were the proposed transfer learning approaches?

Q11. Why is it important to organize benchmark datasets on a platform like the DASE website?

Q12. What are the characteristics of MRA-based pansharpening techniques?

Q13. What is the main concept of MRA-based pansharpening?

Q14. What is the cross-entropy loss of the CNN model?

Q15. How long can the Landsat sensor be able to revisit images?

Q16. How did Liu and his team compare the HSI and LiDAR methods?