What is the result of the restoration of the d image?

The restoration result shown in (d) is with much less highlight and shadow, which is impossible to achieve by gradient transfer or joint filtering.

What is the s map of the image?

Their estimated s map shown in (c) contains large values along object boundaries, and has close-to-zero values for highlight and shadow.

What is the method for restoring a color image?

Since the two input images are color ones under visible light, the authors use each channel from the flash image to guide image restoration in the corresponding channel of the nonflash noisy image.

What is the key to the structure of G?

The authors introduce an auxiliary map s with the same size as G, which is key to their method, to adapt structure of G to that of I∗ – the ground truth noise-free image.

What is the limitation of their current method?

The limitation of their current method is on the situation that the guidance does not exist, corresponding to zero∇G and non-zero ∇I∗ pixels.

What is the main advantage of using flash to restore a color image?

This enables a configuration to take an NIR image with less noisy details by dark flash [11] to guide corresponding noisy color image restoration.

What is the function used to remove outliers?

Further to avoid the extreme situation when∇xGi or∇yGi is close to zero, and enlist the ability to reject outliers, the authors define their data term asE1(s, I) = ∑i( ρ(|si −pi,x∇xIi|)+ρ(|si−pi,y∇yIi|) ) , (4)where ρ is a robust function defined asρ(x) = |x|α, 0 < α < 1. (5)It is used to remove estimation outliers.

What is the main advantage of using flash?

In previous methods, Krishnan et al. [11] used gradients of a dark-flashed image, capturing ultraviolet (UV) and NIR light to guide noise removal in the color image.

What is the simplest way to solve a linear system?

The final linear system in the matrix form is((CTx (Px) 2At+1,tx Cx + C T y (Py) 2At+1,ty Cy) + λB t+1,t ) I= (CTx PxA t+1,t x + C T y PyA t+1,t y )s + λBt+1,tI0. (23)The linear system is also solved using PCG and the solution is denoted as I(t+1).

(Open Access) Cross-Field Joint Image Restoration via Scale Map (2013) | Qiong Yan

Q: What are the three common IRLS terms?

Among them, Ax, Ay and B account for the re-weighting process and are typically computed using estimates from previous iterations – Px and Py are normalization terms from the guidance image.

Q: Why did Krishnan and Zhang develop a method to enhance color images?

because of the popularity of other imaging devices, more computational photography and computer vision solutions based on images captured under different configurations were developed.

Q: What is the simplest way to solve the non-convex function E(s,?

To solve the non-convex function E(s, I) defined in Eq. (14), the authors employ the iterative reweighted least squares (IRLS), which make it possible to convert the original problem to a few corresponding linear systems without losing generality.

Q: What is the effect of the iterative method?

The authors contrarily propose an iterative method, which finds constraints to shape the s map according to its characteristics and yields the effect to remove intensive noise from input I0.

Cross-Field Joint Image Restoration via Scale Map

Qiong Yan

Xiaoyong Shen

Li Xu

Shaojie Zhuo

†

Xiaopeng Zhang

†

Liang Shen

†

Jiaya Jia

The Chinese University of Hong Kong

†

Qualcomm Incorporated

http://www.cse.cuhk.edu.hk/leojia/projects/crossfield/

Abstract

Color, infrared, and ﬂash images captured in different

ﬁelds can be employed to effectively eliminate noise and

other visual artifacts. We propose a two-image restoration

framework considerin g input images in different ﬁelds, for

example, one noisy color image and one dark-ﬂashed near-

infrared image. The major issue in such a framework is

to handle structure divergence and ﬁnd commonly usable

edges and smooth transition for visually compelling image

reconstruction. We introduce a scale map as a competent

representation to explicitly model derivative-level conﬁ-

dence and propose new functions and a numerical solver

to effectively infer it following new structural observations.

Our method is general and shows a principled way for

cross-ﬁeld restoration.

1. Introduction

Images captured in dim light are hardly satisfactory.

They could be very noisy when increasing ISO in a short

exposure duration. Using ﬂash might improve lighting;

but it creates unwanted shadow and highlight, or changes

tone of the image. The methods of [6, 14, 1] restore a

color image based on ﬂash and non-ﬂash inputs of the same

scene. Recently, because of the popularity of other imaging

devices, more computational photography and computer

vision solutions based on images captured under different

conﬁgurations were developed.

For example, near infrared (NIR) images are with a sin-

gle channel recording infrared light reﬂected from objects

with spectrum ranging from 700nm-1000nm in wavelength.

NIR images contain many similar structures as visible co lor

ones when taken from the same camera position. This

enables a conﬁguration to take an NIR image with less

noisy details by dark ﬂash [11] to guide corresponding noisy

color image restoration. The main advantage is on only

using NIR ﬂash invisible to naked human eyes, making

(a) RGB Image (b) NIR Image

Figure 1. Appearance comparison of RGB and NIR images. (a)

RGB image. (b) Corresponding NIR image. (c) Close-ups. The

four columns are for the R, G, B, and NIR channels respectively.

it a suitable way for daily portrait photography and of

remarkable practical importance.

In previous methods, Krishnan et al. [11] used gradients

of a dark-ﬂashed image, capturing u ltraviolet (UV) and NIR

light to guide noise removal in the color image. Considering

rich details in NIR images, Zhang et al. [20] enhanced

the RGB counterpart by transferring contrast and details via

Haar wavelets. In [21] and [16], the detail layer was manip-

ulated differently for RGB and haze image enhancement.

Several methods also explore other image fusion appli-

cations in two-image deblurring [19], m atting [17], tone

mapping [7], upsampling [10], context enhancement [15],

relighting [2], to name a few. Bhat et al. [3] proposed

GradientShop to edit gradients, which can also be used to

enhance images.

We note existing methods work well for their respective

applications by handling different detail layers or gradients

from multip le images. But in terms of two-image high-

quality restoration, there remain a few major and fundamen-

tal issues that were not sufﬁciently addressed. We take the

RGB-NIR images shown in Fig. 1 as an example to reveal

the noticeable difference in detail distribution and intensity

formation. Structure inconsistency existing for many pixels

can be categorized as follows.

• Gradient Magnitude Variation.Intheﬁrstrowof

Fig. 1(c), letter “D” is with different contrast. It is due

to varied reﬂectance to infrared and visible light.

• Gradient Direction Divergence. In the second row,

edge gradients have opposite directions in the two

images, which cause structural deviation.

• Gradient Loss. In the last row, the characters are

completely lost in the NIR image.

• Shadow and Highlight by Flash. If one uses ﬂash

only for the NIR image, it inevitably generates high-

light/shadow that is not contained in the other image.

Examples are presented later.

These issues are caused by inherent discrepancy of

structures in different types o f images, which we call cross-

ﬁeld problems. The algorithms to address them can be

generally referred to as cross-ﬁeld image restoration.Sim-

ple joint image ﬁltering [18, 8] could blur weak edges due

to the inherent smoothing property. Directly transferring

guidance gradients to the noisy ﬁeld also results in unnatural

appearance.

In this paper, we propose a framework via novel scale

map construction. This map captures the nature of structure

discrepancy between images and has clear statistical and

numerical meanings. Based on its analysis, we design

functions to form an optimal scale map considering adap-

tive smoothing, edge preservation, and guidance strength

manipulation. Aforementioned cross-ﬁeld issues are dis-

cussed and addressed in this framework. We also develop

an effective solver via robust function approximation and

problem decomposition, which converges in less than 5

passes compared to other gradient decent alternatives that

may need tens or hundreds of iterations.

2. Modeling and Formulation

Our system takes the input of a noisy RGB image I

and a guidance image G captured from the same camera

position. G can be a dark-ﬂashed NIR image or others

with possible structure variation as discussed above. Other

cross-ﬁeld conﬁgurations are allowed in our framework,

presented in Section 4. Pixel values in each channel are

scaled to [0, 1]. G and I

could have different number of

channels. Our goal is to recover an image from I

with

Figure 2. Optimal scale map s computed from images in Fig. 1

according to Eq. (1). Dark to bright pixels correspond to negativ e

to positive values in different scales.

(a) 2D Images

(b) 1D Signal of Gradient

-0.2

-0.1

0.1

0.2

0.3

Gradient

-10

-5

s Map

Figure 3. 1D illustration. (a) Patch in the color image, NIR image

and s map. Plot (b) contains gradients along the vertical line in the

top two patches. (c) shows corresponding s values. Most of them

are zeros; positive and negative values also exist.

noise removed and structure retained. We process color

channels separately.

We introduce an auxiliary map s with the same size as

G, which is key to our method, to adapt structure of G to

that of I

∗

– the ground truth noise-free image. The s map is

deﬁned under condition

min ∇I

∗

− s ·∇G. (1)

Here ∇ is an operator forming a vector with x-andy-

direction gradients. Each element s

in map s,where

i indexes pixels, is a scalar, measuring robust difference

between corresponding gradients in the two images. Simply

put, s is a ratio m a p between the guidance and latent images.

The optimal s corresponding to the cross-ﬁeld example in

Fig. 1 is shown in Fig. 2, visualized as a color image after

pixel-wise value normalization to [0,1].

We analyze the properties of s with regard to structure

discrepancy between ∇G and ∇I

∗

, and present them as

follows with the illustration in Fig. 3.

Property of s First, sign of each s

can be either positive

or negative. A negative s

means edges exist in the two

images, but with opposite directions, as demonstrated in

Fig. 3(c). Second, when the guidance image G contains

extra shadow and highlight caused by ﬂash, which are

absent in ∇I

∗

, s

with value 0 can help ignore them.

Finally, s

can be any value when ∇G

=0–thatis,

guidance edge does not exist, such as the red letters in Fig.

3(a). In this case, under local smoothness, s

being0isa

good choice.

In short, an optimal s map should be able to represent

all these structure discrepancies. It is ﬁrst-of-a-kind to avail

cross-ﬁeld restoration. Its additional beneﬁt is the special

role as latent variables to develop an efﬁcient optimization

procedure.

More of the Function We denote by I our estimate

towards I

∗

. Eq. (1) is updated to

min ∇I − s ·∇G. (2)

As it involves unknowns ∇I and s, which correlate, the

functio n is ill-posed. We take its variation as a d ata term

expression, together with regularization on s, to construct

an objective function.

2.1. Data Term about s

In |s

∇G

−∇I

|,wherei indexes pixels, ∇G

can be

analogously regarded as a scale map for s

due to the dual

relation between s

and ∇G

. It controls the penalty when

computing s

for different pixels. The ﬁnal cost resulted

from |s

∇G

−∇I

| is dependent on the value of ∇G

For example, if ∇G

and ∇I

are doubled simultaneously,

although s remains the same, the cost from |s

∇G

−∇I

will get twice larger.

To stabilize costs w.r.t. s

, we perform normalization



−

∇

| + |s

−

∇

|, (3)

which is modulated by the two components of ∇G

.It

removes the unexpected scaling effect caused by ∇G

Further to avoid the extreme situation when ∇

or ∇

is close to zero, and enlist the ability to r e ject outliers, we

deﬁne our data term as

(s, I)=





ρ(|s

− p

i,x

∇

|)+ρ(|s

− p

i,y

∇



, (4)

where ρ is a robust function deﬁned as

ρ(x)=|x|

, 0 <α<1. (5)

(a) Isotropic Smoothing (b) Anisotropic Smoothing

Figure 4. Isotropic versus anisotropic smoothing of the s map.

Result in (b) from anisotropic smoothing contains higher contrast

structure. The input images are shown in Fig. 5(a).

It is used to r emove estimation outliers. We set α =0.9

in experiments. p

i,k

,wherek ∈{x, y}, is a truncation

function

i,k

sign(∇

) · max(|∇

|,ε)

, (6)

where sign(x) is the sign operator, outputting 1 if

∇

is positive or zero and outputting -1 otherwise.

max(|∇

|,ε) returns the larger value between |∇

and ε. The threshold ε is used to avoid division by zero and

is set to 0.004 empirically.

2.2. Data Term for I

ThedatatermforI is simply set as

(I)=



ρ(|I

− I

0,i

|), (7)

where ρ is the same robust function and I

0,i

is the color

of pixel i in I

. E

(I) requires the restoration result not

to wildly deviate from the input noisy image I

especially

along salient edges. The robust function ρ helps reject part

of the noise from I

2.3. Regularization Term

Our regularization term is deﬁned with anisotropic gra-

dient tensors [13, 4]. It is based on the fact that s values

are similar locally o nly in certain d irections. For instance, s

values should change smoothly or be constant along an edge

more than those across it. As shown in Fig. 4, uniformly

smoothing s in all directions blurs sharp edges.

Our anisotropic tensor scheme preserves sharp edges

according to gradient directions of G. By a few algebraic

operations, an anisotropic tenso r is expressed as

D(∇G

(∇G

)

+2η

((∇G

⊥

)(∇G

⊥

)

+ η

1), (8)

where ∇G

⊥

=(∇

, −∇

)

is a vector perpendicular

to ∇G

, 1 is an identity matrix and scalar η controls the

isotropic smoothness. When ∇G

is much smaller than

η, Eq. (8) degrades to 0.5 · 1 and the structure tensor is

therefore isotropic.

Generally, the two o rthogonal eigenvectors of D(∇G

)

are

i,1

∇G

|∇G

, v

i,2

∇G

⊥

|∇G

, (9)

with corresponding eigenvalues

i,1

(∇G

)

+2η

,μ

i,2

(∇G

)

+ η

(∇G

)

+2η

. (10)

This decomposes the tensor to

D(∇G



i,1

i,2





i,1

0 μ

i,2



i,1

i,2



. (11)

This form makes it possible to express regularizatio n for

each ∇s

(∇s

)=μ

i,1

∇s

)

+ μ

i,2

∇s

)

. (12)

Different smoothing penalties are contr olled by μ

i,1

and

i,2

in directions v

i,1

and v

i,2

, across and along edges

respectively. Stronger smoothness is naturally imposed

along edges. The ﬁnal smoothing term is thus deﬁned as

(∇s)=





i,1

∇s

)

+ μ

i,2

∇s

)



. (13)

2.4. Final Objective Function

The ﬁnal objective function to estimate the s map and

restore image I is written as

E(s, I)=E

(s, I)+λE

(I)+βE

(∇s), (14)

where λ controls the conﬁdence on noisy image I

,andβ

corresponds to smoothness of s. We describe their setting

in Section 4.

This objective function is non-convex due to the in-

volvement of sparsity terms. Joint representation for s and

I in optimization f urther complicates it. Naive gradient

decent cannot guarantee optimality and leads to very slow

convergence even for a local minimum. We contrarily

propose an iterative method, which ﬁnds constraints to

shape the s map according to its characteristics and yields

the effect to remove intensive noise from input I

3. Numerical Solution

To solve the non-convex function E(s, I) deﬁned in

Eq. (14), we employ the iterative reweighted least squares

(IRLS), which make it possible to convert the original

problem to a few corresponding linear systems without

losing generality. This process, however, is still nontrivial

and needs a few derivations.

Initially, robust function ρ(x) in Eq. (5) for any scalar x

can be written as x

/|x|

2−α

, further approximated as

ρ(x) ≈ φ(x) · x

, (15)

where φ(x) is deﬁned as

φ(x)=

|x|

2−α

+ 

. (16)

 is a small number to avoid division by 0. We set it to

1E −4 empirically. This form splits the robust function into

two parts where φ(x) can be regarded as a weight for x

.In

our method, following the tradition of IRLS, φ(x) and x

are updated alternatively during optimization because each

of them can work together with other n ecessary terms to

form simpler representations, proﬁting optimization.

Vector Form To ease derivation, we re -write Eq. (14) in

the vector form by taking the expression in Eq. (15) into

computation. It yields

E(s, I)=(s − P

(s − P

+(s − P

(s − P

+ λ(I − I

)

B(I − I

)+βs

Ls, (17)

where s, I,andI

are vector representations of s, I,and

. C

and C

are discrete backward difference matrices

that are used to compute image gradients in the x− and

y−directions. P

, P

, A

and B are diagonal matrices,

whose i-th diagonal elements are deﬁned as

)

= p

i,x

, (A

)

= φ(s

− p

i,x

∇

)

= p

i,y

, (A

)

= φ(s

− p

i,y

∇

= φ(I

− I

0,i

Among them, A

, A

and B account for the re-weighting

process and are typically computed using estimates from

prev ious iterations – P

and P

are normalization terms

from the guidance image. The ﬁrst three terms in Eq. (17)

correspond to terms E

and E

; s

Ls is created by E

Note the last term s

Ls controls spatial smoothness of s,

where matrix L is a smoothing Laplacian, expressed as

L = C

(Σ

+Σ

+ C

(Σ

+Σ

+2C

(Σ

− Σ

(18)

after a bit complicated derivations. Σ

, Σ

, V

,andV

are

all d iagonal matrices. Their i-th diagonal elements are

(Σ

)

= μ

i,1

, (V

)

= ∇

/ max(|∇G

|,ε),

(Σ

)

= μ

i,2

, (V

)

= ∇

/ max(|∇G

|,ε).

Algorithm 1 Cross-Field Image Restoration.

1: input: noisy image I

, guidance image G, parameters

β and λ

2: initialize I ← I

, s ← 1

3: repeat

4: estimate s according to Eq. (21)

5: estimate I according to Eq. (23)

6: until convergence

7: output: s map and restored image I

Analysis We note L is actually an inhomogeneous term,

reﬂecting the anisotropic property of our smoothing regu-

larizer. To understand it, consider the extreme case that ∇G

approaches zero. It leads to Σ

=Σ

and V

= V

0,makingL a homogenous Laplacian. The resulting s

map is therefore smooth in all directions. But in natural

images, ∇G on an edge is not isotropic and should be with

nonuniform regularization strength. Also, sparse C

and

lead to the sparse Laplacian matrix L, which facilitates

optimization b ecause many mature sparse-matrix solvers

exist in this community already.

3.1. Solver

We solve f or s and I based on above derivations. Results

of s and I in each iteration t are denoted as s

(t)

and I

(t)

Initially, we set s

(0)

= 1, whose elements are all 1sand

(0)

= I

By setting all initial s

to 1s, total smoothness is ob-

tained. It yields zero cost for E

(s), a nice starting point

for optimization. This initialization also makes the starting

∇I same as ∇G with many details. Then at iteration t +1,

we solve two subproblems alternatively

• Given s

(t)

and I

(t)

, minimize E(s, I

(t)

) to get s

(t+1)

• Given s

(t+1)

and I

(t)

, minimize E(s

(t+1)

, I) to update

(t+1)

The procedure is repeated until s and I do not change

too much. Usually, 4-6 iterations are enough to generate

visually compelling results. The algorithm is depicted in

Algorithm 1, with the solvers elaborated on as follows.

Solve for s

(t+1)

The energy function with respect to s can

be expressed as

E(s)=(s − P

(s − P

+(s − P

(s − P

I)+βs

Ls. (19)

Computation of A

and A

depends on estimates s and I

from the previous iteration. We denote by A

t,t

and A

t,t

the

matrices computed with s

(t)

and I

(t)

, which lead to

E(s)=(s − P

(t)

)

t,t

(s − P

(t)

)

+(s − P

(t)

)

t,t

(s − P

(t)

)+βs

Ls.

(20)

It is simply quad ratic. Taking derivatives on s and setting

them to 0s, we obtain the sparse linear system

t,t

+βL)s = A

t,t

(t)

t,t

(t)

. (21)

We solved it using pre-conditioned conjugate gradient

(PCG). The solution is denoted as s

(t+1)

Solve for I

(t+1)

Similarly, the energy function to solve for

I is given by

E(I)=(s

(t+1)

− P

t+1,t

(t+1)

− P

+(s

(t+1)

− P

t+1,t

(t+1)

− P

+ λ(I − I

)

t+1,t

(I − I

), (22)

where A

t+1,t

and A

t+1,t

are calcu lated with available s

(t+1)

and I

(t)

. B

t+1,t

depends on I

(t)

. The ﬁnal linear system in

the matrix form is



)

t+1,t

+ C

)

t+1,t

)+λB

t+1,t



=(C

t+1,t

+ C

t+1,t

)s + λB

t+1,t

. (23)

The linear system is also solved using PCG and the solution

is denoted as I

(t+1)

3.2. Why Does It Work?

According to the linear system deﬁned in Eq. (21), the

resulting s

for pixel i is a weighted average of p

i,x

∇

≈

∇

/∇

and p

i,y

∇

≈∇

/∇

, whose weights

are determined by (A

)

and (A

)

. Even if these weights

are quite different due to noise or other aforementioned

issues described in Section 1, our method can still get a

reasonable solution. We explain why this happens.

Assuming p

i,x

∇

is larger than the other term, in

solving for I according to Eq. (23), s

reduces the gradient

in the x-direction and increases the other so that ∇I

lies

close to s∇G

. In the meantime, noise is reduced. Then

after each iteration, a less noisy I is p ut into Eq. (21) to

produce new p

i,x

∇

and p

i,x

∇

, which are closer than

those in previous iterations.

Eventually when the two estimates meet each other, s

converges; I is accordingly optimal. The smoothness term

L in Eq. (21) helps avoid discontinuity in the s map along

edges of G .

We show in Fig. 5(e) the initial constant s map. (f)-

(g) are maps produced in two iterations, and (h) shows the

ﬁnal s. Initially the map is noisy because of confusing

or contradictive gradient magnitudes and directions in the

Cross-Field Joint Image Restoration via Scale Map

Figures

Citations

Adaptive Quantile Sparse Image (AQuaSI) Prior for Inverse Imaging Problems

Interpretable Multi-Modal Image Registration Network Based on Disentangled Convolutional Sparse Coding

Exploiting Non-Local Priors via Self-Convolution for Highly-Efficient Image Restoration

Scale-Aware Multispectral Fusion of RGB and NIR Images Based on Alternating Guidance

Sensitivity Improvement of Extremely Low Light Scenes with RGB-NIR Multispectral Filter Array Sensor.

References

Scale-space and edge detection using anisotropic diffusion

Bilateral filtering for gray and color images

Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering

Single Image Haze Removal Using Dark Channel Prior

Guided image filtering

Related Papers (5)

Guided Image Filtering

Digital photography with flash and no-flash image pairs

Bilateral filtering for gray and color images

Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering

A non-local algorithm for image denoising

Frequently Asked Questions (13)

Q1. What are the three common IRLS terms?

Q2. Why did Krishnan and Zhang develop a method to enhance color images?

Q3. What is the simplest way to solve the non-convex function E(s,?

Q4. What is the effect of the iterative method?

Q5. What is the result of the restoration of the d image?

Q6. What is the s map of the image?

Q7. What is the method for restoring a color image?

Q8. What is the key to the structure of G?

Q9. What is the limitation of their current method?

Q10. What is the main advantage of using flash to restore a color image?

Q11. What is the function used to remove outliers?

Q12. What is the main advantage of using flash?

Q13. What is the simplest way to solve a linear system?