What have the authors contributed in "Patch-based high dynamic range video" ?

To address this problem, the authors propose a new approach for HDR reconstruction from alternating exposure video sequences that combines the advantages of optical flow and recently introduced patch-based synthesis for HDR images. The authors present results of their reconstructed HDR video sequences that are superior to those produced by current approaches. Furthermore, their method is able to synthesize plausible texture and motion in fast-moving regions, where either patch-based synthesis or optical flow alone would exhibit artifacts.

What have the authors stated for future works in "Patch-based high dynamic range video" ?

Although their artifacts are still more plausible than those of Kang et al., a better way of handling the saturated regions can be investigated in the future. The authors leave the acceleration of their algorithm for future work. In terms of speed, their algorithm ’ s runtime can be significantly improved with a more optimized implementation.

(Open Access) Patch-based high dynamic range video (2013) | Nima Khademi Kalantari

Patch-Based High Dynamic Range Video

Nima Khademi Kalantari

Eli Shechtman

Connelly Barnes

2,3

Soheil Darabi

Dan B Goldman

Pradeep Sen

University of California, Santa Barbara

Adobe

University of Virginia

...

High Low Middle High Low

Figure 1: (top row) Input video acquired using an off-the-shelf camera, which alternates between three exposures separated by two stops.

(bottom row) Our algorithm reconstructs the missing LDR images and generates an HDR image at each frame. The HDR video result for this

ThrowingTowel3Exp scene can be found in the supplementary materials. This layout is adapted from Kang et al. [2003].

Abstract

Despite signiﬁcant progress in high dynamic range (HDR) imaging

over the years, it is still difﬁcult to capture high-quality HDR video

with a conventional, off-the-shelf camera. The most practical way

to do this is to capture alternating exposures for every LDR frame

and then use an alignment method based on optical ﬂow to register

the exposures together. However, this results in objectionable arti-

facts whenever there is complex motion and optical ﬂow fails. To

address this problem, we propose a new approach for HDR recon-

struction from alternating exposure video sequences that combines

the advantages of optical ﬂow and recently introduced patch-based

synthesis for HDR images. We use patch-based synthesis to enforce

similarity between adjacent frames, increasing temporal continuity.

To synthesize visually plausible solutions, we enforce constraints

from motion estimation coupled with a search window map that

guides the patch-based synthesis. This results in a novel recon-

struction algorithm that can produce high-quality HDR videos with

a standard camera. Furthermore, our method is able to synthesize

plausible texture and motion in fast-moving regions, where either

patch-based synthesis or optical ﬂow alone would exhibit artifacts.

We present results of our reconstructed HDR video sequences that

are superior to those produced by current approaches.

CR Categories: I.4.1 [Computing Methodologies]: Image Pro-

cessing and Computer Vision—Digitization and Image Capture

Keywords: High dynamic range video, patch-based synthesis

Links:

DL PDF WEB

1 Introduction

High dynamic range (HDR) imaging is now popular and becoming

more widespread. Most of the research to date, however, has fo-

cused on improving the capture of still HDR images, while HDR

video capture has received considerably less attention. This is a

serious deﬁcit, since high-quality HDR video would signiﬁcantly

improve our ability to capture dynamic environments as our eyes

perceive them. The reason for this lack of progress is that the bulk

of HDR video research has focused on specialized HDR camera

systems (e.g., [

Nayar and Mitsunaga 2000; Unger and Gustavson

2007; Tocci et al. 2011; SpheronVR 2013; Kronander et al. 2013]).

Unfortunately, the high cost and general unavailability of these

cameras make them impractical for the average consumer.

On the other hand, still HDR photography has leveraged the fact

that a typical consumer camera can acquire a set of low dynamic

range (LDR) images at different exposures, which can then be

merged into a single HDR image [Mann and Picard 1995; Debevec

and Malik 1997]. However, most of the methods that address arti-

facts in dynamic scenes (e.g., [Zimmer et al. 2011; Sen et al. 2012])

only produce still images and cannot be used for HDR video.

The fundamental challenge is that producing high-quality HDR

video from a set of alternating LDR exposures requires reconstruct-

ing well-aligned and temporally coherent LDR images. This needs

to be done for each exposure in every frame so that the resulting

HDR video is free of artifacts. Optical ﬂow based solutions [

Kang

et al. 2003; Mangiat and Gibson 2010; Ginger HDR 2013] are suit-

able for scenes with small motion, but fail with complex motion.

In these cases, they produce visible tearing and “ghosting” artifacts

due to the failure of optical ﬂow near motion boundaries.

Our method builds upon the recent work on HDR reconstruction

for still images that poses the problem as a patch-based optimiza-

tion [

Sen et al. 2012]. Although this approach produces high-

quality still HDR images, it is unsuitable for HDR video due to

the lack of temporal coherency (see, e.g., ThrowingTowel3Exp

in the supplementary materials

In this work we propose a new, temporally coherent patch-based

optimization algorithm that can produce high-quality HDR video

from an input sequence of alternating exposures captured with an

off-the-shelf camera. We show how optical ﬂow can be utilized in

conjunction with a patch-based method to achieve motion smooth-

ness, providing robustness to failures of optical ﬂow in areas of

fast motion and occlusions. Where the optical ﬂow fails, the patch-

based method synthesizes plausible textures and the artifacts are

typically conﬁned to very small regions close to motion boundaries.

Masking effects in the human visual system make these artifacts

very difﬁcult to detect in moving video.

Our key contribution is to combine optical ﬂow with a patch-based

synthesis approach similar to Sen et al. [

2012] to achieve tempo-

ral coherency. We show that a simple combination of the two

components does not work well and propose a method to com-

pute spatially-varying search windows for handling complex mo-

tions. A secondary contribution is jitter suppression for temporal

coherency, using multiple motion models to regularize the patch-

based alignment in under-constrained regions. As a result of these

contributions, we are able to demonstrate high-quality HDR videos

for scenes with large camera and non-rigid scene motion.

2 Related work

The problem of HDR imaging has been extensively studied in the

past, although most of the previous work has focused on the recon-

struction of still HDR images. For brevity, we shall only consider

methods that have been speciﬁcally developed for – or shown to

handle – HDR video, and refer readers interested in general HDR

imaging to texts on the subject [

Reinhard et al. 2010].

As mentioned earlier, the systems that have produced perhaps the

most high-quality results to date have been specialized cameras that

capture HDR videos directly. These include cameras with special

sensors to measure a larger dynamic range [

Brajovic and Kanade

1996; Seger et al. 1999; Nayar and Mitsunaga 2000; Nayar and

Branzoi 2003; Unger and Gustavson 2007; Portz et al. 2013], or

with beam-splitters that split the light to different sensors so that

each measures a different portion of the radiance domain simulta-

neously [Tocci et al. 2011; Kronander et al. 2013]. However, these

approaches are limited by the fact that they require specialized, cus-

tom hardware, which make them expensive and less widespread.

One possible way to capture HDR video with conventional cameras

is to use external beam-splitters [McGuire et al. 2007; Cole and

Safai 2013]. However, this additional hardware makes the system

Some artifacts are difﬁcult to observe in still images, and so in the paper

we refer the reader to our supplementary video materials by scene name.

bulky and difﬁcult to use. Moreover, even simple tasks like chang-

ing the focus or zooming become difﬁcult because of the necessary

camera synchronization. Therefore, the more practical way is to use

a single camera that alternates exposures for each frame. Although

not all video cameras can currently do this, there are efforts to in-

crease the programmability of digital cameras (e.g., [

Adams et al.

2010]). Furthermore, it is not difﬁcult to ﬁnd off-the-shelf cameras

that can alternate exposures (e.g., the Basler acA2000-50gc cam-

era used in this work). This approach has been explored in the

past [Kang et al. 2003; Mangiat and Gibson 2010; Magic Lantern

2013], and we use it for our capture as well.

Kang et al. [2003] demonstrate the ﬁrst practical method for gen-

erating HDR video using an off-the-shelf camera with a system

that acquires sequences that alternate between short and long ex-

posures. They ﬁrst use optical ﬂow to unidirectionally warp the

previous/next frames to a given frame. They then merge them to-

gether in the regions where the current frame is well-exposed with

a weighted blend to reject ghosting. For the over/under-exposed re-

gions of the current frame, they bidirectionally interpolate the pre-

vious/next frames using optical ﬂow followed by a hierarchical ho-

mography algorithm to help with the alignment process. Although

Kang et al.’s method can increase the dynamic range of videos, their

algorithm has visible artifacts when the input video contains non-

rigid or fast motion as can be seen in Figs.

6 and 7. This problem is

due to the fact that the algorithm relies heavily on existing motion

estimation methods that are still prone to errors in these cases.

The recent work of Mangiat and Gibson [2010] is perhaps the state-

of-the-art for producing HDR video using off-the-shelf cameras.

To overcome the problems of gradient-based optical ﬂow used in

Kang et al., they propose a block-based motion estimation approach

to approximate motion between adjacent frames. Moreover, they

propose a motion reﬁnement stage and a ﬁltering stage that uses

a cross-bilateral ﬁ lter to remove the block boundary artifacts. In

follow-up work, Mangiat and Gibson [

2011] demonstrate improved

results by ﬁltering the regions with large motion to hide the artifacts

of mis-registration. However, their results still suffer from blocking

artifacts, as shown in Fig. 6. Moreover, their method is designed to

handle sequences with only two exposures.

Finally, some publicly-available software has been developed to

capture alternating exposures and produce HDR video. For exam-

ple, the MagicLantern ﬁrmware available for certain Canon DSLR

cameras [

2013] has an HDR video mode that allows for capturing

video with alternating ISOs. The resulting video can then be used

with Ginger HDR [2013], which features a stand-alone “Merger”

tool that utilizes optical ﬂow to register frames and produce an HDR

output. However, like the optical ﬂow based method of Kang et al.,

it has many artifacts that are visible in scenes with large motion.

3 Proposed algorithm

In order to acquire an HDR video stream with a conventional video

camera, we must ﬁrst capture an input video that alternates between

different exposures for each frame, as shown in Fig. 2. Formally,

given a set of N LDR images taken by alternating between M

different exposures (L

ref,1

ref,2

,...,L

ref,N

), our goal is to recon-

struct the N HDR frames (H

,n∈{1,...N}) for the entire video

sequence

. To do this, our algorithm must reconstruct the missing

LDR images at each frame (L

m,n

,m∈{1,...,M},m = ref),

shown with dashed red squares in Fig. 2. Note we use the term “ref-

erence images” to refer to the LDR images captured by the camera.

Note that the exposure of the reference image is not ﬁxed and depends

on the frame number. Therefore, the correct notation would be ref(n),but

for the ease of notation we skip this formality.

BDS

...

BDS

...

BDS

...

BDS

...

BDS

...

ref,1

1,2

1,M

ref,M+1

1,M+2

1,N

2,1

ref,2

2,M

2,M+1

ref,M+2

2,N

M,1

M,2

ref,M

M,M+1

M,M+2

ref,N

N Frames

First set of M alternating exposures

M exposures

Figure 2: An example video sequence with N frames. To capture

HDR video, our off-the-shelf camera alternates between M differ-

ent exposures, capturing only one speciﬁc exposure at each frame

(shown with solid black squares). Our algorithm reconstructs the

missing exposures at each frame (dashed red squares) by doing a

patch search/vote on the two neighboring frames. To maximize the

temporal coherency, the patch searches are performed around an

estimated motion ﬂow (given by the green arrows). Once these

missing LDR frames have been reconstructed, the different expo-

sures can be merged together for every frame to produce the ﬁnal

sequence of HDR images.

To reconstruct the HDR images from the LDR inputs, Sen et

al. [

2012] had proposed a patch-based optimization system for still

HDR photography that satisﬁed two properties: 1) the ﬁnal HDR

image H

should be very close to the reference image n after map-

ping it to the radiance domain h(L

ref,n

) wherever L

ref,n

is well-

exposed, and 2) H

should include information from the captured

images at the M different exposures neighboring frame n.Al-

though this often works well for still images, their method is un-

suitable for our application since it lacks temporal coherency (see

ThrowingTowel3Exp in the supplementary materials), a neces-

sity for high-quality HDR video. Furthermore, their method can

also generate unsatisfactory results when a large region of the refer-

ence image is under- or over-exposed. This is particularly relevant

for our video application since the reference frame must vary in

exposure for each time instant, resulting in large missing regions in

many reference frames. Therefore, a direct application of the Sen et

al. method to video yields unacceptable results, as shown in Fig.

To address the problem of temporal coherence, we ﬁrst observe that

despite the motion from frame to frame in a video, the content of

consecutive frames is very similar. For example, the LDR images of

consecutive frames that have the same exposure (each of the rows in

Fig. 2) will be very similar. The second observation is that many dy-

namic scenes can be approximated using multiple large regions that

move coherently across consecutive frames. Guided by these ob-

servations and drawing some of the elements from the patch-based

optimization framework of Sen et al. [2012], we propose the fol-

lowing energy function for HDR video reconstruction:

E(all L

m,n

’s)=



n=1



p∈pixels



ref,n

(p)

· (h(L

ref,n

)

(p)

− H

n(p)

)

+(1− α

ref,n

(p)

) ·



m=1,m=ref

Λ(L

m,n

)(h(L

m,n

)

(p)

− H

n(p)

)

+(1− α

ref,n

(p)

) ·



m=1

TBDS(L

m,n

m,n−1

m,n+1

)



(1)

Sen et al. Ours

Figure 3: Three HDR frames of the ThrowingTowel3Exp

scene generated by both the method of Sen et al. [2012] and our

method. The method of Sen et al. works best when the reference

image is the middle exposure (middle). In the frames where the low

or high exposed images are the reference (top and bottom, respec-

tively), their method has artifacts, as indicated by the green arrows.

Our method generates plausible results in all cases.

In the ﬁrst term, h(L

ref,n

) is a function that maps the LDR image

ref,n

to the linear radiance domain, and α

ref,n

is a function (Fig. 5)

that approximates how well each pixel in L

ref,n

is exposed. This

term ensures that the HDR reconstruction H

is similar to h(L

ref,n

)

in an L

sense in the well-exposed regions. The second term en-

sures that all the LDR images in one frame are similar to the HDR

image in that frame in an L

sense for the regions that are not well-

exposed in the reference image. This term maintains the relation-

ship between the HDR image and the LDR’s that compose it, so it

is weighted by the triangle function Λ() used for merging [Debevec

and Malik 1997]. Finally, the third term helps enforce temporal co-

herence by leveraging ideas from Regenerative Morphing [Shecht-

man et al. 2010]. In this case, we propose to use temporal bidirec-

tional similarity (TBDS) to measure the bidirectional similarity of

the LDR image L

m,n

to its counterparts in the previous (L

m,n−1

)

and next (L

m,n+1

) frames:

TBDS(L

m,n

m,n−1

m,n+1

)=BDS(L

m,n

m,n−1

)

+ BDS(L

m,n

m,n+1

(2)

Here we use the patch-based bidirectional similarity (BDS) metric

proposed by Simakov et al. [2008], except that we constrain the

search based on the estimated local motion to further improve tem-

poral coherence:

BDS(T, S)=

|S|



p∈pixels

min

i⊂f

(p)±w

(p)

D(s(p),t(i))

|T |



p∈pixels

min

i⊂f

(p)±w

(p)

D(t(p),s(i)), (3)

where s(p) and t(p) denote the patches centered at pixel p in the

source and the target images, and D() refers to the sum of the

ref, n

ref, n+1

ref, n-1

f ( )

n+1

p’

n-1

f ( )

n-1

f ( )

n+1

p’=

Figure 4: To validate f

n−1

(p), the ﬂow from L

to L

n−1

shown

with red arrow, we ﬁrst compute f

n+1

(p) and f

n−1

n+1

shown with

blue arrows. We then concatenate these two ﬂows to get f

n−1

n+1

(p)

where p = f

n+1

(p). If this ﬂow is inside a small window (shown

in green) around f

n−1

(p), we keep it, otherwise we discard it. In

this case, the ﬂow shown in red will be discarded since it does not

pass the consistency check.

squared differences (SSD) between two patches. We have modi-

ﬁed the standard BDS equation by adding the f

(p) and w

(p) to

constrain our search: f

(p) is the approximate motion ﬂow at pixel

p from the S to T and w

(p) scales the search window around it.

Intuitively, the ﬁrst term (completeness) ensures that for every patch

s(p) in the source, there is a similar patch in the region deﬁned by

(p) ± w

(p) in the target image and vice versa for the second

term (coherence). As shown by Simakov et al. [

2008], minimizing

this metric implies that the target image contains most of the con-

tent from the source image in a visually coherent way. As a result,

minimizing the third term in Eq. 1 ensures that each LDR image

m,n

contains similar content to its temporal neighbors. Moreover,

constraining the patch searches around an initial motion estimation

results in temporal coherency in the output video.

In our algorithm, we ﬁrst estimate a rough initial motion, then use

it to calculate a local search window size. We then minimize Eq. 1

using a two-stage iterative algorithm that iterates between the two

stages until convergence. This method reconstructs the missing

LDR images, which are ﬁnally combined to form the ﬁnal HDR

results. Therefore, our method consists of three main steps:

1. Initial motion estimation (Sec. 3.1): A rough motion is

estimated in the two directions between consecutive frames

(p) and f

(p) in Eq. 3). We use a planar model (similar-

ity transform) for the global motion and optical ﬂow for the

local motion estimation.

2. Search window map computation (Sec. 3.2): A window

size is computed for every ﬂow vector (w

(p) and w

(p) in

Eq. 3). This search window map is used as the search window

size around each initial estimate of the motion.

3. HDR video reconstruction (Sec. 3.3): A two-stage iterative

method is used to minimize Eq. 1. In the ﬁrst stage, a multi-

scale constrained patch search-and-vote is performed to min-

imize the last term of Eq. 1, and, in the second stage, an HDR

merge step with reference injection [Sen et al. 2012]isused

to minimize the ﬁrst two terms. The algorithm iterates be-

tween these two stages until convergence. This reconstructs

the missing LDR images and produces the ﬁnal HDR frames.

We now discuss each of these steps in turn in the following sections.

3.1 Initial motion estimation

Computing the BDS between a pair of images requires performing

a search in two directions, each requiring a motion ﬂow estimation

as per Eq.

3. Therefore, the two BDS terms in Eq. 2 involve the es-

timation of four motion ﬂows at every frame n: f

n−1

(p), f

n−1

(p),

n+1

(p),andf

n+1

(p), . Our motion estimation algorithm com-

bines a similarity transform (rotation, translation, isometric scale)

for the global motion followed by an optical ﬂow computation.

The camera motion can be approximately removed by a similarity

transform since there is little camera movement between adjacent

frames, while local scene motion is estimated by optical ﬂow.

The ﬁrst step is to ﬁnd a similarity transform between the next and

previous frames (L

ref,n+1

and L

ref,n−1

) to the current frame L

ref,n

This requires raising the exposure of the image with the lower expo-

sure time to that of the other image to compensate for the exposure

differences. To do this, we ﬁrst apply the inverse camera response

function to take the image with the lower exposure into the linear

radiance domain. We then multiply it by the exposure ratio of the

two images, and, ﬁnally apply the camera response function to map

the radiance values into the LDR domain. After performing the ex-

posure adjustment, we use RANSAC to ﬁnd a dominant similarity

model from the correspondences between the two images. Next,

we warp the two neighboring images using the calculated similar-

ity transforms to remove the global motion and facilitate the local

motion estimation using optical ﬂow. The rest of the process is per-

formed on the warped images.

For simplicity, we only explain the process for estimating motion

from frame n to n − 1 (denoted by f

n−1

(p)), but the other ﬂows

are calculated in a similar manner. Since most optical ﬂow algo-

rithms rely on the brightness constancy assumption, we ﬁrst adjust

the exposure of all three images (n − 1,n,n +1)to match the

one with the highest exposure. This is necessary because our ﬂow

validation process, which will be explained later, works on all the

three images under the assumption that they were captured under

the same conditions. After adjusting the exposures, we use the op-

tical ﬂow method of Liu [

2009] to compute f

n−1

(p).

As is well known, this ﬂow might be inaccurate because of noise,

saturated pixels, or complex motions. One common way for esti-

mating erroneous ﬂow is to compare f

n−1

(p) with f

n−1

(p) and

keep the ﬂows only if they are close to each other [Brox and Malik

2011]. However, we found this approach was not robust enough,

often validating incorrect ﬂow since errors are often symmetric.

Therefore, we use a more robust ﬂow consistency test based on

triplets of frames, as shown in Fig. 4. To do this, we calculate

the ﬂows f

n−1

, f

n+1

(p) and f

n−1

n+1

(p) and check if the concatena-

tion f

n−1

n+1

(p)) is inside a small window around f

n−1

(p).We

keep the ﬂow vectors where the concatenation is within a very small

window b

min

, and otherwise we discard it as invalid. In addition,

we discard the ﬂows in the regions where L

ref,n

is highly saturated

(all three channels greater than δ

) due to the lack of meaningful

content. The ﬁnal ﬂow is obtained by concatenating this optical

ﬂow result with the similarity transform. In our implementation,

we set b

min

to 0.002 times the image size and δ

to 0.99.

The estimated ﬂow is used as a guide during the patch synthesis

process to constrain the search to a small, local window around the

ﬂow vector. The size of the local window depends on the accuracy

estimation of the optical ﬂow, which is described next.

3.2 Search window map computation

The search window map deﬁnes the size of the search window

around each ﬂow obtained in the previous step. This search win-

dow should be large enough so that the correct patch can be found

during the patch search process, but not so large that it causes tem-

poral jittering in the ﬁnal result. The ideal size would be equal to

the distance of the correct motion to the estimated ﬂow, but, since

we do not know the correct motion apriori, we need a method to

estimate a window size where a good match can be found. Note

that traditional optical ﬂow conﬁdence measures (e.g., [Jahne et al.

1999]) are not suitable for our purpose as they usually give a score

map reﬂecting the probability to estimate correct motion.

We propose to use a patch search process to determine the s ize of

the search window around each ﬂow vector. We start with a small

search window around the ﬂow and perform a patch search to ﬁnd a

similar patch. If a good match is not found within a given threshold,

the process is continued for several iterations, increasing the search

window each time. Once a good patch is found, we use that search

window s ize as the value in the search window map.

More explicitly, in order to ﬁnd a search window w

n−1

(p) around

a ﬂow vector f

n−1

(p) from L

ref,n

to L

ref,n−1

, we ﬁrst match the

exposure of the two images by raising the exposure of the lower

one to match the higher one. For simplicity in this explanation,

we simply use L

ref,n

and L

ref,n−1

to refer to the exposure adjusted

versions of these images. Next, for a patch in L

ref,n

centered on p,

we look for the closest patch in an L

sense in a very small window

min

around f

n−1

(p). If the distance in color space between these

two patches is less than a threshold δ

(0.04 in our implementation),

we assign w

n−1

(p)=b

min

In order to penalize patches that diverge greatly in one color chan-

nel, we compute the patch SSD for each color channel s eparately

and take the maximum distance as the ﬁnal value. If the distance

is above the threshold, we exponentially increase the window size

by a factor of two and continue the patch search and distance com-

parison. If a proper window size has not been found after four iter-

ations, we assign a large window size to this ﬂow b

max

, which we

set equal to 0.4 times the image size.

The regions where L

ref,n

is highly saturated (all three channels

greater than δ

) do not have enough content, so we use a different

strategy to deﬁne the window search size. We ﬁrst warp L

ref,n−1

using f

n−1

(p). If the pixel value of the warped image in these

highly saturated regions is smaller than δ

, we assign a large search

window b

max

, otherwise we assign a very small window b

min

Since we use a patch-based method to compute the search window

map, patches on the boundary between an accurate and inaccurate

ﬂow region will cover both regions. Therefore, the patch distances

for these regions might be inaccurate, which makes the computed

search window unreliable. To alleviate this problem and give more

freedom to the patches in these regions, we dilate the search map by

twice the patch width (7 in our implementation) to compute the ﬁnal

search map. This whole process is done for all other ﬂow vectors

that are used in our TBDS calculation.

3.3 HDR video reconstruction

Once we have computed the initial motion and the search window

map, we minimize the energy in Eq. 1 using a two-stage algorithm.

In the ﬁrst stage, a constrained patch search-and-vote process is per-

formed for each BDS term in Eq. 2, resulting in two voted images

for each LDR image, shown with dashed red squares in Fig. 2.We

then replace the LDR image with the average of these two voted

images. We continue this search-and-vote process several times to

minimize the third term in Eq. 1 [Shechtman et al. 2010]. The sec-

ond stage, similar to Sen et al. [2012], consists of merging all the

voted images and the reference image into an HDR image at each

frame. This process simultaneously minimizes the second term of

Eq. 1 and ensures that the ﬁrst term is satisﬁed by injecting the

well-exposed pixels of the reference image into the HDR frame.

The algorithm iterates between these two stages until it converges.

Our algorithm begins by initializing all of the LDR images to the

exposure-adjusted version of the reference image from the same

0.1 0.9

ref,n

0.2 0.9

ref,n

Figure 5: The α

ref,n

curves. (left) Sen et al. [2012], (middle) for

search windows smaller than b

max

, (right) for search windows of

size b

max

. Note the curves only differ in the under-exposed regions

and they are the same as Sen et al. in the over-exposed regions.

frame. Then, for each LDR image L

m,n

, we perform two bidi-

rectional constrained patch searches against L

m,n+1

and L

m,n−1

These constrained searches are performed in a window (Sec. 3.2)

around the initial motion ﬂow estimate (Sec. 3.1). Next, in the vot-

ing process, the searched patches for completeness and coherence

(the ﬁrst and second terms in Eq. 3, respectively) are weighted av-

eraged to generate a voted image for each BDS term in Eq. 2.The

LDR image L

m,n

is then replaced with the average of these two

voted images. We continue this search-and-vote process several

times until convergence.

In the next step, the averaged images from all M LDR sources in

each frame ( L

m,n

,m∈{1,...,M}) are combined using the HDR

merge process, as proposed by Sen et al. [2012], to form an inter-

mediate HDR frame H

. The HDR merge process injects the well-

exposed pixels of the reference image L

ref,n

into the HDR frame.

For the over/under exposed regions, we blend the reference im-

age with the other LDR images in that frame using α

ref,n

(shown

in Fig. 5 (middle)). Finally, we replace each missing LDR image

m,n

with l

) which maps the radiance values of H

to the

exposure range of m. This process continues iteratively and in a

multiscale fashion to minimize Eq. 1. Note that in coarse scales we

reduce the size of the window according to the resolution of the im-

age at that scale. In the coarsest scale, our images have 150 pixels

in the smaller dimension and we have a total of 6 scales with a ratio



x/150,wherex is the minimum dimension of input frames.

We use 20 iterations at the coarsest scale and linearly decrease it

to 5 at the ﬁnest scale. Because we constrain the search to a small

window around the initial ﬂow, our optimization converges faster

and with fewer iterations and scales relative to Sen et al.

Under-exposed regions must be treated carefully when estimating

the HDR image to avoid artifacts from the alternating exposures.

The parameter α

ref,n

in Eq. 1 determines what is over/under ex-

posed and, therefore, controls the contribution of the reference im-

age L

ref,n

in the HDR image. Sen et al. used a ﬁxed trapezoid func-

tion shown in Fig. 5 (left) as α

ref,n

(see Eq. 1) with a valid range of

0.1 to 0.9. This means that their method heavily relies on the refer-

ence image in the dark regions, which can be problematic when the

reference image has low exposure. As can be seen in Fig. 3 (top)

this washes out the details in the dark regions. Instead, to suppress

the noise in the ﬁnal HDR result, we set the minimum value of the

valid range to 0.2 and use (L

ref,n

(p)

/0.2)

as α

ref,n

in the under-

exposed regions (L

ref,n

(p)

< 0.2) as shown in Fig. 5 (middle).

Moreover, in the places that the search map has a large window

max

, we use the α

ref,n

curve shown in Fig. 5 (right), which uses

ref,n

(p)

/0.2)

0.5

in the under-exposed regions. The reason is that

the areas with large search windows are often occluded or under-

going very complex motion, so the reference needs to be injected

more to avoid deviating from the reference. Since the motion is

usually fast in these regions, artifacts are difﬁcult to perceive.

Although we constrain the patch search to a small window around

the rough initial motion ﬂow, the HDR results might still exhibit a

small amount of jittering. This jittering occurs in the under- and

over-exposed regions of the reference image, where the valid infor-

Patch-based high dynamic range video

Figures

Citations

Robust High Dynamic Range Imaging by Rank Minimization

Blind video temporal consistency

Temporally coherent completion of dynamic video

Fast burst images denoising

The State of the Art in HDR Deghosting: A Survey and Evaluation

References

Distinctive Image Features from Scale-Invariant Keypoints

Recovering high dynamic range radiance maps from photographs

PatchMatch: a randomized correspondence algorithm for structural image editing

Photographic tone reproduction for digital images

High Dynamic Range Imaging: Acquisition, Display, and Image-Based Lighting

Related Papers (5)

Robust patch-based hdr reconstruction of dynamic scenes

Recovering high dynamic range radiance maps from photographs

High dynamic range video

PatchMatch: a randomized correspondence algorithm for structural image editing

High Dynamic Range Imaging: Acquisition, Display, and Image-Based Lighting

Frequently Asked Questions (2)

Q1. What have the authors contributed in "Patch-based high dynamic range video" ?

Q2. What have the authors stated for future works in "Patch-based high dynamic range video" ?