scispace - formally typeset
Open AccessJournal ArticleDOI

Patch-based high dynamic range video

Reads0
Chats0
TLDR
This work proposes a new approach for HDR reconstruction from alternating exposure video sequences that combines the advantages of optical flow and recently introduced patch-based synthesis for HDR images and results in a novel reconstruction algorithm that can produce high-quality HDR videos with a standard camera.
Abstract
Despite significant progress in high dynamic range (HDR) imaging over the years, it is still difficult to capture high-quality HDR video with a conventional, off-the-shelf camera. The most practical way to do this is to capture alternating exposures for every LDR frame and then use an alignment method based on optical flow to register the exposures together. However, this results in objectionable artifacts whenever there is complex motion and optical flow fails. To address this problem, we propose a new approach for HDR reconstruction from alternating exposure video sequences that combines the advantages of optical flow and recently introduced patch-based synthesis for HDR images. We use patch-based synthesis to enforce similarity between adjacent frames, increasing temporal continuity. To synthesize visually plausible solutions, we enforce constraints from motion estimation coupled with a search window map that guides the patch-based synthesis. This results in a novel reconstruction algorithm that can produce high-quality HDR videos with a standard camera. Furthermore, our method is able to synthesize plausible texture and motion in fast-moving regions, where either patch-based synthesis or optical flow alone would exhibit artifacts. We present results of our reconstructed HDR video sequences that are superior to those produced by current approaches.

read more

Content maybe subject to copyright    Report

Patch-Based High Dynamic Range Video
Nima Khademi Kalantari
1
Eli Shechtman
2
Connelly Barnes
2,3
Soheil Darabi
2
Dan B Goldman
2
Pradeep Sen
1
1
University of California, Santa Barbara
2
Adobe
3
University of Virginia
...
...
...
...
High Low Middle High Low
Figure 1: (top row) Input video acquired using an off-the-shelf camera, which alternates between three exposures separated by two stops.
(bottom row) Our algorithm reconstructs the missing LDR images and generates an HDR image at each frame. The HDR video result for this
ThrowingTowel3Exp scene can be found in the supplementary materials. This layout is adapted from Kang et al. [2003].
Abstract
Despite significant progress in high dynamic range (HDR) imaging
over the years, it is still difficult to capture high-quality HDR video
with a conventional, off-the-shelf camera. The most practical way
to do this is to capture alternating exposures for every LDR frame
and then use an alignment method based on optical flow to register
the exposures together. However, this results in objectionable arti-
facts whenever there is complex motion and optical flow fails. To
address this problem, we propose a new approach for HDR recon-
struction from alternating exposure video sequences that combines
the advantages of optical flow and recently introduced patch-based
synthesis for HDR images. We use patch-based synthesis to enforce
similarity between adjacent frames, increasing temporal continuity.
To synthesize visually plausible solutions, we enforce constraints
from motion estimation coupled with a search window map that
guides the patch-based synthesis. This results in a novel recon-
struction algorithm that can produce high-quality HDR videos with
a standard camera. Furthermore, our method is able to synthesize
plausible texture and motion in fast-moving regions, where either
patch-based synthesis or optical flow alone would exhibit artifacts.
We present results of our reconstructed HDR video sequences that
are superior to those produced by current approaches.
CR Categories: I.4.1 [Computing Methodologies]: Image Pro-
cessing and Computer Vision—Digitization and Image Capture
Keywords: High dynamic range video, patch-based synthesis
Links:
DL PDF WEB
1 Introduction
High dynamic range (HDR) imaging is now popular and becoming
more widespread. Most of the research to date, however, has fo-
cused on improving the capture of still HDR images, while HDR
video capture has received considerably less attention. This is a
serious deficit, since high-quality HDR video would significantly
improve our ability to capture dynamic environments as our eyes
perceive them. The reason for this lack of progress is that the bulk
of HDR video research has focused on specialized HDR camera
systems (e.g., [
Nayar and Mitsunaga 2000; Unger and Gustavson
2007; Tocci et al. 2011; SpheronVR 2013; Kronander et al. 2013]).
Unfortunately, the high cost and general unavailability of these
cameras make them impractical for the average consumer.
On the other hand, still HDR photography has leveraged the fact
that a typical consumer camera can acquire a set of low dynamic
range (LDR) images at different exposures, which can then be
merged into a single HDR image [Mann and Picard 1995; Debevec
and Malik 1997]. However, most of the methods that address arti-
facts in dynamic scenes (e.g., [Zimmer et al. 2011; Sen et al. 2012])
only produce still images and cannot be used for HDR video.

The fundamental challenge is that producing high-quality HDR
video from a set of alternating LDR exposures requires reconstruct-
ing well-aligned and temporally coherent LDR images. This needs
to be done for each exposure in every frame so that the resulting
HDR video is free of artifacts. Optical flow based solutions [
Kang
et al. 2003; Mangiat and Gibson 2010; Ginger HDR 2013] are suit-
able for scenes with small motion, but fail with complex motion.
In these cases, they produce visible tearing and “ghosting” artifacts
due to the failure of optical flow near motion boundaries.
Our method builds upon the recent work on HDR reconstruction
for still images that poses the problem as a patch-based optimiza-
tion [
Sen et al. 2012]. Although this approach produces high-
quality still HDR images, it is unsuitable for HDR video due to
the lack of temporal coherency (see, e.g., ThrowingTowel3Exp
in the supplementary materials
1
).
In this work we propose a new, temporally coherent patch-based
optimization algorithm that can produce high-quality HDR video
from an input sequence of alternating exposures captured with an
off-the-shelf camera. We show how optical flow can be utilized in
conjunction with a patch-based method to achieve motion smooth-
ness, providing robustness to failures of optical flow in areas of
fast motion and occlusions. Where the optical flow fails, the patch-
based method synthesizes plausible textures and the artifacts are
typically confined to very small regions close to motion boundaries.
Masking effects in the human visual system make these artifacts
very difficult to detect in moving video.
Our key contribution is to combine optical flow with a patch-based
synthesis approach similar to Sen et al. [
2012] to achieve tempo-
ral coherency. We show that a simple combination of the two
components does not work well and propose a method to com-
pute spatially-varying search windows for handling complex mo-
tions. A secondary contribution is jitter suppression for temporal
coherency, using multiple motion models to regularize the patch-
based alignment in under-constrained regions. As a result of these
contributions, we are able to demonstrate high-quality HDR videos
for scenes with large camera and non-rigid scene motion.
2 Related work
The problem of HDR imaging has been extensively studied in the
past, although most of the previous work has focused on the recon-
struction of still HDR images. For brevity, we shall only consider
methods that have been specifically developed for or shown to
handle HDR video, and refer readers interested in general HDR
imaging to texts on the subject [
Reinhard et al. 2010].
As mentioned earlier, the systems that have produced perhaps the
most high-quality results to date have been specialized cameras that
capture HDR videos directly. These include cameras with special
sensors to measure a larger dynamic range [
Brajovic and Kanade
1996; Seger et al. 1999; Nayar and Mitsunaga 2000; Nayar and
Branzoi 2003; Unger and Gustavson 2007; Portz et al. 2013], or
with beam-splitters that split the light to different sensors so that
each measures a different portion of the radiance domain simulta-
neously [Tocci et al. 2011; Kronander et al. 2013]. However, these
approaches are limited by the fact that they require specialized, cus-
tom hardware, which make them expensive and less widespread.
One possible way to capture HDR video with conventional cameras
is to use external beam-splitters [McGuire et al. 2007; Cole and
Safai 2013]. However, this additional hardware makes the system
1
Some artifacts are difficult to observe in still images, and so in the paper
we refer the reader to our supplementary video materials by scene name.
bulky and difficult to use. Moreover, even simple tasks like chang-
ing the focus or zooming become difficult because of the necessary
camera synchronization. Therefore, the more practical way is to use
a single camera that alternates exposures for each frame. Although
not all video cameras can currently do this, there are efforts to in-
crease the programmability of digital cameras (e.g., [
Adams et al.
2010]). Furthermore, it is not difficult to find off-the-shelf cameras
that can alternate exposures (e.g., the Basler acA2000-50gc cam-
era used in this work). This approach has been explored in the
past [Kang et al. 2003; Mangiat and Gibson 2010; Magic Lantern
2013], and we use it for our capture as well.
Kang et al. [2003] demonstrate the first practical method for gen-
erating HDR video using an off-the-shelf camera with a system
that acquires sequences that alternate between short and long ex-
posures. They first use optical flow to unidirectionally warp the
previous/next frames to a given frame. They then merge them to-
gether in the regions where the current frame is well-exposed with
a weighted blend to reject ghosting. For the over/under-exposed re-
gions of the current frame, they bidirectionally interpolate the pre-
vious/next frames using optical flow followed by a hierarchical ho-
mography algorithm to help with the alignment process. Although
Kang et al.s method can increase the dynamic range of videos, their
algorithm has visible artifacts when the input video contains non-
rigid or fast motion as can be seen in Figs.
6 and 7. This problem is
due to the fact that the algorithm relies heavily on existing motion
estimation methods that are still prone to errors in these cases.
The recent work of Mangiat and Gibson [2010] is perhaps the state-
of-the-art for producing HDR video using off-the-shelf cameras.
To overcome the problems of gradient-based optical flow used in
Kang et al., they propose a block-based motion estimation approach
to approximate motion between adjacent frames. Moreover, they
propose a motion refinement stage and a filtering stage that uses
a cross-bilateral lter to remove the block boundary artifacts. In
follow-up work, Mangiat and Gibson [
2011] demonstrate improved
results by filtering the regions with large motion to hide the artifacts
of mis-registration. However, their results still suffer from blocking
artifacts, as shown in Fig. 6. Moreover, their method is designed to
handle sequences with only two exposures.
Finally, some publicly-available software has been developed to
capture alternating exposures and produce HDR video. For exam-
ple, the MagicLantern firmware available for certain Canon DSLR
cameras [
2013] has an HDR video mode that allows for capturing
video with alternating ISOs. The resulting video can then be used
with Ginger HDR [2013], which features a stand-alone “Merger”
tool that utilizes optical flow to register frames and produce an HDR
output. However, like the optical flow based method of Kang et al.,
it has many artifacts that are visible in scenes with large motion.
3 Proposed algorithm
In order to acquire an HDR video stream with a conventional video
camera, we must first capture an input video that alternates between
different exposures for each frame, as shown in Fig. 2. Formally,
given a set of N LDR images taken by alternating between M
different exposures (L
ref,1
,L
ref,2
,...,L
ref,N
), our goal is to recon-
struct the N HDR frames (H
n
,n∈{1,...N}) for the entire video
sequence
2
. To do this, our algorithm must reconstruct the missing
LDR images at each frame (L
m,n
,m∈{1,...,M},m = ref),
shown with dashed red squares in Fig. 2. Note we use the term “ref-
erence images” to refer to the LDR images captured by the camera.
2
Note that the exposure of the reference image is not fixed and depends
on the frame number. Therefore, the correct notation would be ref(n),but
for the ease of notation we skip this formality.

BDS
...
...
BDS
BDS
...
...
...
...
BDS
...
BDS
BDS
BDS
...
BDS
BDS
...
...
...
...
L
ref,1
L
1,2
L
1,M
L
ref,M+1
L
1,M+2
L
1,N
L
2,1
L
ref,2
L
2,M
L
2,M+1
L
ref,M+2
L
2,N
L
M,1
L
M,2
L
ref,M
L
M,M+1
L
M,M+2
L
ref,N
N Frames
First set of M alternating exposures
M exposures
Figure 2: An example video sequence with N frames. To capture
HDR video, our off-the-shelf camera alternates between M differ-
ent exposures, capturing only one specific exposure at each frame
(shown with solid black squares). Our algorithm reconstructs the
missing exposures at each frame (dashed red squares) by doing a
patch search/vote on the two neighboring frames. To maximize the
temporal coherency, the patch searches are performed around an
estimated motion flow (given by the green arrows). Once these
missing LDR frames have been reconstructed, the different expo-
sures can be merged together for every frame to produce the final
sequence of HDR images.
To reconstruct the HDR images from the LDR inputs, Sen et
al. [
2012] had proposed a patch-based optimization system for still
HDR photography that satisfied two properties: 1) the final HDR
image H
n
should be very close to the reference image n after map-
ping it to the radiance domain h(L
ref,n
) wherever L
ref,n
is well-
exposed, and 2) H
n
should include information from the captured
images at the M different exposures neighboring frame n.Al-
though this often works well for still images, their method is un-
suitable for our application since it lacks temporal coherency (see
ThrowingTowel3Exp in the supplementary materials), a neces-
sity for high-quality HDR video. Furthermore, their method can
also generate unsatisfactory results when a large region of the refer-
ence image is under- or over-exposed. This is particularly relevant
for our video application since the reference frame must vary in
exposure for each time instant, resulting in large missing regions in
many reference frames. Therefore, a direct application of the Sen et
al. method to video yields unacceptable results, as shown in Fig.
3.
To address the problem of temporal coherence, we first observe that
despite the motion from frame to frame in a video, the content of
consecutive frames is very similar. For example, the LDR images of
consecutive frames that have the same exposure (each of the rows in
Fig. 2) will be very similar. The second observation is that many dy-
namic scenes can be approximated using multiple large regions that
move coherently across consecutive frames. Guided by these ob-
servations and drawing some of the elements from the patch-based
optimization framework of Sen et al. [2012], we propose the fol-
lowing energy function for HDR video reconstruction:
E(all L
m,n
’s)=
N
n=1
ppixels
α
ref,n
(p)
· (h(L
ref,n
)
(p)
H
n(p)
)
2
+(1 α
ref,n
(p)
) ·
M
m=1,m=ref
Λ(L
m,n
)(h(L
m,n
)
(p)
H
n(p)
)
2
+(1 α
ref,n
(p)
) ·
M
m=1
TBDS(L
m,n
,L
m,n1
,L
m,n+1
)
.
(1)
Sen et al. Ours
Figure 3: Three HDR frames of the ThrowingTowel3Exp
scene generated by both the method of Sen et al. [2012] and our
method. The method of Sen et al. works best when the reference
image is the middle exposure (middle). In the frames where the low
or high exposed images are the reference (top and bottom, respec-
tively), their method has artifacts, as indicated by the green arrows.
Our method generates plausible results in all cases.
In the first term, h(L
ref,n
) is a function that maps the LDR image
L
ref,n
to the linear radiance domain, and α
ref,n
is a function (Fig. 5)
that approximates how well each pixel in L
ref,n
is exposed. This
term ensures that the HDR reconstruction H
n
is similar to h(L
ref,n
)
in an L
2
sense in the well-exposed regions. The second term en-
sures that all the LDR images in one frame are similar to the HDR
image in that frame in an L
2
sense for the regions that are not well-
exposed in the reference image. This term maintains the relation-
ship between the HDR image and the LDR’s that compose it, so it
is weighted by the triangle function Λ() used for merging [Debevec
and Malik 1997]. Finally, the third term helps enforce temporal co-
herence by leveraging ideas from Regenerative Morphing [Shecht-
man et al. 2010]. In this case, we propose to use temporal bidirec-
tional similarity (TBDS) to measure the bidirectional similarity of
the LDR image L
m,n
to its counterparts in the previous (L
m,n1
)
and next (L
m,n+1
) frames:
TBDS(L
m,n
,L
m,n1
,L
m,n+1
)=BDS(L
m,n
,L
m,n1
)
+ BDS(L
m,n
,L
m,n+1
).
(2)
Here we use the patch-based bidirectional similarity (BDS) metric
proposed by Simakov et al. [2008], except that we constrain the
search based on the estimated local motion to further improve tem-
poral coherence:
BDS(T, S)=
1
|S|
ppixels
min
if
T
S
(p)±w
T
S
(p)
D(s(p),t(i))
+
1
|T |
ppixels
min
if
S
T
(p)±w
S
T
(p)
D(t(p),s(i)), (3)
where s(p) and t(p) denote the patches centered at pixel p in the
source and the target images, and D() refers to the sum of the

L
ref, n
L
ref, n+1
L
ref, n-1
p
f ( )
n+1
p
n-1
f ( )
n
p
n-1
f ( )
n
p
n+1
p’=
Figure 4: To validate f
n1
n
(p), the flow from L
n
to L
n1
shown
with red arrow, we first compute f
n+1
n
(p) and f
n1
n+1
shown with
blue arrows. We then concatenate these two flows to get f
n1
n+1
(p)
where p = f
n+1
n
(p). If this flow is inside a small window (shown
in green) around f
n1
n
(p), we keep it, otherwise we discard it. In
this case, the flow shown in red will be discarded since it does not
pass the consistency check.
squared differences (SSD) between two patches. We have modi-
fied the standard BDS equation by adding the f
T
S
(p) and w
T
S
(p) to
constrain our search: f
T
S
(p) is the approximate motion flow at pixel
p from the S to T and w
T
S
(p) scales the search window around it.
Intuitively, the first term (completeness) ensures that for every patch
s(p) in the source, there is a similar patch in the region defined by
f
T
S
(p) ± w
T
S
(p) in the target image and vice versa for the second
term (coherence). As shown by Simakov et al. [
2008], minimizing
this metric implies that the target image contains most of the con-
tent from the source image in a visually coherent way. As a result,
minimizing the third term in Eq. 1 ensures that each LDR image
L
m,n
contains similar content to its temporal neighbors. Moreover,
constraining the patch searches around an initial motion estimation
results in temporal coherency in the output video.
In our algorithm, we first estimate a rough initial motion, then use
it to calculate a local search window size. We then minimize Eq. 1
using a two-stage iterative algorithm that iterates between the two
stages until convergence. This method reconstructs the missing
LDR images, which are finally combined to form the final HDR
results. Therefore, our method consists of three main steps:
1. Initial motion estimation (Sec. 3.1): A rough motion is
estimated in the two directions between consecutive frames
(f
T
S
(p) and f
S
T
(p) in Eq. 3). We use a planar model (similar-
ity transform) for the global motion and optical flow for the
local motion estimation.
2. Search window map computation (Sec. 3.2): A window
size is computed for every flow vector (w
T
S
(p) and w
S
T
(p) in
Eq. 3). This search window map is used as the search window
size around each initial estimate of the motion.
3. HDR video reconstruction (Sec. 3.3): A two-stage iterative
method is used to minimize Eq. 1. In the first stage, a multi-
scale constrained patch search-and-vote is performed to min-
imize the last term of Eq. 1, and, in the second stage, an HDR
merge step with reference injection [Sen et al. 2012]isused
to minimize the first two terms. The algorithm iterates be-
tween these two stages until convergence. This reconstructs
the missing LDR images and produces the final HDR frames.
We now discuss each of these steps in turn in the following sections.
3.1 Initial motion estimation
Computing the BDS between a pair of images requires performing
a search in two directions, each requiring a motion flow estimation
as per Eq.
3. Therefore, the two BDS terms in Eq. 2 involve the es-
timation of four motion flows at every frame n: f
n1
n
(p), f
n
n1
(p),
f
n+1
n
(p),andf
n
n+1
(p), . Our motion estimation algorithm com-
bines a similarity transform (rotation, translation, isometric scale)
for the global motion followed by an optical flow computation.
The camera motion can be approximately removed by a similarity
transform since there is little camera movement between adjacent
frames, while local scene motion is estimated by optical flow.
The first step is to find a similarity transform between the next and
previous frames (L
ref,n+1
and L
ref,n1
) to the current frame L
ref,n
.
This requires raising the exposure of the image with the lower expo-
sure time to that of the other image to compensate for the exposure
differences. To do this, we first apply the inverse camera response
function to take the image with the lower exposure into the linear
radiance domain. We then multiply it by the exposure ratio of the
two images, and, finally apply the camera response function to map
the radiance values into the LDR domain. After performing the ex-
posure adjustment, we use RANSAC to find a dominant similarity
model from the correspondences between the two images. Next,
we warp the two neighboring images using the calculated similar-
ity transforms to remove the global motion and facilitate the local
motion estimation using optical flow. The rest of the process is per-
formed on the warped images.
For simplicity, we only explain the process for estimating motion
from frame n to n 1 (denoted by f
n1
n
(p)), but the other flows
are calculated in a similar manner. Since most optical flow algo-
rithms rely on the brightness constancy assumption, we first adjust
the exposure of all three images (n 1,n,n +1)to match the
one with the highest exposure. This is necessary because our flow
validation process, which will be explained later, works on all the
three images under the assumption that they were captured under
the same conditions. After adjusting the exposures, we use the op-
tical flow method of Liu [
2009] to compute f
n1
n
(p).
As is well known, this flow might be inaccurate because of noise,
saturated pixels, or complex motions. One common way for esti-
mating erroneous flow is to compare f
n1
n
(p) with f
n
n1
(p) and
keep the flows only if they are close to each other [Brox and Malik
2011]. However, we found this approach was not robust enough,
often validating incorrect flow since errors are often symmetric.
Therefore, we use a more robust flow consistency test based on
triplets of frames, as shown in Fig. 4. To do this, we calculate
the flows f
n1
n
, f
n+1
n
(p) and f
n1
n+1
(p) and check if the concatena-
tion f
n1
n+1
(f
n+1
n
(p)) is inside a small window around f
n1
n
(p).We
keep the flow vectors where the concatenation is within a very small
window b
min
, and otherwise we discard it as invalid. In addition,
we discard the flows in the regions where L
ref,n
is highly saturated
(all three channels greater than δ
s
) due to the lack of meaningful
content. The final flow is obtained by concatenating this optical
flow result with the similarity transform. In our implementation,
we set b
min
to 0.002 times the image size and δ
s
to 0.99.
The estimated flow is used as a guide during the patch synthesis
process to constrain the search to a small, local window around the
flow vector. The size of the local window depends on the accuracy
estimation of the optical flow, which is described next.
3.2 Search window map computation
The search window map defines the size of the search window
around each flow obtained in the previous step. This search win-
dow should be large enough so that the correct patch can be found
during the patch search process, but not so large that it causes tem-
poral jittering in the final result. The ideal size would be equal to
the distance of the correct motion to the estimated flow, but, since
we do not know the correct motion apriori, we need a method to

estimate a window size where a good match can be found. Note
that traditional optical flow confidence measures (e.g., [Jahne et al.
1999]) are not suitable for our purpose as they usually give a score
map reflecting the probability to estimate correct motion.
We propose to use a patch search process to determine the s ize of
the search window around each flow vector. We start with a small
search window around the flow and perform a patch search to find a
similar patch. If a good match is not found within a given threshold,
the process is continued for several iterations, increasing the search
window each time. Once a good patch is found, we use that search
window s ize as the value in the search window map.
More explicitly, in order to find a search window w
n1
n
(p) around
a flow vector f
n1
n
(p) from L
ref,n
to L
ref,n1
, we first match the
exposure of the two images by raising the exposure of the lower
one to match the higher one. For simplicity in this explanation,
we simply use L
ref,n
and L
ref,n1
to refer to the exposure adjusted
versions of these images. Next, for a patch in L
ref,n
centered on p,
we look for the closest patch in an L
2
sense in a very small window
b
min
around f
n1
n
(p). If the distance in color space between these
two patches is less than a threshold δ
n
(0.04 in our implementation),
we assign w
n1
n
(p)=b
min
.
In order to penalize patches that diverge greatly in one color chan-
nel, we compute the patch SSD for each color channel s eparately
and take the maximum distance as the final value. If the distance
is above the threshold, we exponentially increase the window size
by a factor of two and continue the patch search and distance com-
parison. If a proper window size has not been found after four iter-
ations, we assign a large window size to this flow b
max
, which we
set equal to 0.4 times the image size.
The regions where L
ref,n
is highly saturated (all three channels
greater than δ
s
) do not have enough content, so we use a different
strategy to define the window search size. We first warp L
ref,n1
using f
n1
n
(p). If the pixel value of the warped image in these
highly saturated regions is smaller than δ
s
, we assign a large search
window b
max
, otherwise we assign a very small window b
min
.
Since we use a patch-based method to compute the search window
map, patches on the boundary between an accurate and inaccurate
flow region will cover both regions. Therefore, the patch distances
for these regions might be inaccurate, which makes the computed
search window unreliable. To alleviate this problem and give more
freedom to the patches in these regions, we dilate the search map by
twice the patch width (7 in our implementation) to compute the final
search map. This whole process is done for all other flow vectors
that are used in our TBDS calculation.
3.3 HDR video reconstruction
Once we have computed the initial motion and the search window
map, we minimize the energy in Eq. 1 using a two-stage algorithm.
In the first stage, a constrained patch search-and-vote process is per-
formed for each BDS term in Eq. 2, resulting in two voted images
for each LDR image, shown with dashed red squares in Fig. 2.We
then replace the LDR image with the average of these two voted
images. We continue this search-and-vote process several times to
minimize the third term in Eq. 1 [Shechtman et al. 2010]. The sec-
ond stage, similar to Sen et al. [2012], consists of merging all the
voted images and the reference image into an HDR image at each
frame. This process simultaneously minimizes the second term of
Eq. 1 and ensures that the first term is satisfied by injecting the
well-exposed pixels of the reference image into the HDR frame.
The algorithm iterates between these two stages until it converges.
Our algorithm begins by initializing all of the LDR images to the
exposure-adjusted version of the reference image from the same
0.1 0.9
ref,n
L
0.2 0.9
1
0.2 0.9
11
ref,n
L
ref,n
L
Figure 5: The α
ref,n
curves. (left) Sen et al. [2012], (middle) for
search windows smaller than b
max
, (right) for search windows of
size b
max
. Note the curves only differ in the under-exposed regions
and they are the same as Sen et al. in the over-exposed regions.
frame. Then, for each LDR image L
m,n
, we perform two bidi-
rectional constrained patch searches against L
m,n+1
and L
m,n1
.
These constrained searches are performed in a window (Sec. 3.2)
around the initial motion flow estimate (Sec. 3.1). Next, in the vot-
ing process, the searched patches for completeness and coherence
(the first and second terms in Eq. 3, respectively) are weighted av-
eraged to generate a voted image for each BDS term in Eq. 2.The
LDR image L
m,n
is then replaced with the average of these two
voted images. We continue this search-and-vote process several
times until convergence.
In the next step, the averaged images from all M LDR sources in
each frame ( L
m,n
,m∈{1,...,M}) are combined using the HDR
merge process, as proposed by Sen et al. [2012], to form an inter-
mediate HDR frame H
n
. The HDR merge process injects the well-
exposed pixels of the reference image L
ref,n
into the HDR frame.
For the over/under exposed regions, we blend the reference im-
age with the other LDR images in that frame using α
ref,n
(shown
in Fig. 5 (middle)). Finally, we replace each missing LDR image
L
m,n
with l
m
(H
n
) which maps the radiance values of H
n
to the
exposure range of m. This process continues iteratively and in a
multiscale fashion to minimize Eq. 1. Note that in coarse scales we
reduce the size of the window according to the resolution of the im-
age at that scale. In the coarsest scale, our images have 150 pixels
in the smaller dimension and we have a total of 6 scales with a ratio
of
5
x/150,wherex is the minimum dimension of input frames.
We use 20 iterations at the coarsest scale and linearly decrease it
to 5 at the finest scale. Because we constrain the search to a small
window around the initial flow, our optimization converges faster
and with fewer iterations and scales relative to Sen et al.
Under-exposed regions must be treated carefully when estimating
the HDR image to avoid artifacts from the alternating exposures.
The parameter α
ref,n
in Eq. 1 determines what is over/under ex-
posed and, therefore, controls the contribution of the reference im-
age L
ref,n
in the HDR image. Sen et al. used a fixed trapezoid func-
tion shown in Fig. 5 (left) as α
ref,n
(see Eq. 1) with a valid range of
0.1 to 0.9. This means that their method heavily relies on the refer-
ence image in the dark regions, which can be problematic when the
reference image has low exposure. As can be seen in Fig. 3 (top)
this washes out the details in the dark regions. Instead, to suppress
the noise in the final HDR result, we set the minimum value of the
valid range to 0.2 and use (L
ref,n
(p)
/0.2)
2
as α
ref,n
in the under-
exposed regions (L
ref,n
(p)
< 0.2) as shown in Fig. 5 (middle).
Moreover, in the places that the search map has a large window
b
max
, we use the α
ref,n
curve shown in Fig. 5 (right), which uses
(L
ref,n
(p)
/0.2)
0.5
in the under-exposed regions. The reason is that
the areas with large search windows are often occluded or under-
going very complex motion, so the reference needs to be injected
more to avoid deviating from the reference. Since the motion is
usually fast in these regions, artifacts are difficult to perceive.
Although we constrain the patch search to a small window around
the rough initial motion flow, the HDR results might still exhibit a
small amount of jittering. This jittering occurs in the under- and
over-exposed regions of the reference image, where the valid infor-

Figures
Citations
More filters
Journal ArticleDOI

Robust High Dynamic Range Imaging by Rank Minimization

TL;DR: A rank minimization algorithm is presented which simultaneously aligns LDR images and detects outliers for robust HDR generation and is evaluated systematically and qualitatively with results from the state-of-the-art HDR algorithms using challenging real world examples.
Journal ArticleDOI

Blind video temporal consistency

TL;DR: This work proposes a gradient-domain technique that is blind to the particular image processing algorithm, and takes a series of processed frames that suffers from flickering and generates a temporally-consistent video sequence.
Journal ArticleDOI

Temporally coherent completion of dynamic video

TL;DR: An automatic video completion algorithm that synthesizes missing regions in videos in a temporally coherent fashion is presented that can handle dynamic scenes captured using a moving camera and jointly estimating optical flow and color in the missing regions.
Journal ArticleDOI

Fast burst images denoising

TL;DR: A fast denoising method that produces a clean image from a burst of noisy images by introducing a lightweight camera motion representation called homography flow and a mechanism of selecting consistent pixels for temporal fusion to handle scene motion during the capture.
Journal ArticleDOI

The State of the Art in HDR Deghosting: A Survey and Evaluation

TL;DR: A taxonomy of deghosting algorithms is proposed which can be used to group existing and future algorithms into meaningful classes, and the results of a subjective experiment are shared which aims to evaluate various state‐of‐the‐art de ghosting algorithms.
References
More filters
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Proceedings ArticleDOI

Recovering high dynamic range radiance maps from photographs

TL;DR: This work discusses how this work is applicable in many areas of computer graphics involving digitized photographs, including image-based modeling, image compositing, and image processing, and demonstrates a few applications of having high dynamic range radiance maps.
Journal ArticleDOI

PatchMatch: a randomized correspondence algorithm for structural image editing

TL;DR: This paper presents interactive image editing tools using a new randomized algorithm for quickly finding approximate nearest-neighbor matches between image patches, and proposes additional intuitive constraints on the synthesis process that offer the user a level of control unavailable in previous methods.
Proceedings ArticleDOI

Photographic tone reproduction for digital images

TL;DR: The work presented in this paper leverages the time-tested techniques of photographic practice to develop a new tone reproduction operator and uses and extends the techniques developed by Ansel Adams to deal with digital images.
Book

High Dynamic Range Imaging: Acquisition, Display, and Image-Based Lighting

TL;DR: The Human Visual System and HDR Tone Mapping and Frequency Domain and Gradient Domain Tone Reproduction and an Image-Based Lighting List of Symbols References Index are presented.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What have the authors contributed in "Patch-based high dynamic range video" ?

To address this problem, the authors propose a new approach for HDR reconstruction from alternating exposure video sequences that combines the advantages of optical flow and recently introduced patch-based synthesis for HDR images. The authors present results of their reconstructed HDR video sequences that are superior to those produced by current approaches. Furthermore, their method is able to synthesize plausible texture and motion in fast-moving regions, where either patch-based synthesis or optical flow alone would exhibit artifacts. 

Although their artifacts are still more plausible than those of Kang et al., a better way of handling the saturated regions can be investigated in the future. The authors leave the acceleration of their algorithm for future work. In terms of speed, their algorithm ’ s runtime can be significantly improved with a more optimized implementation.