Proceedings Article•DOI•

A non-local MRF model for heritage architectural image completion

Deepan Gupta¹, Vaidehi Chhajer¹, Anand Mishra¹, C. V. Jawahar¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

16 Dec 2012-pp 61

TL;DR: This work proposes a non-local MRF model for image completion problem, which represents the patches in the target region of the image as random variables in an MRF, and introduces a novel energy function on these variables.

read less

Abstract: MRF models have shown state-of-the-art performance for many computer vision tasks. In this work, we propose a non-local MRF model for image completion problem. The goal of image completion is to fill user specified "target" region with patches of "source" regions in a way that is visually plausible to an observer. We represent the patches in the target region of the image as random variables in an MRF, and introduce a novel energy function on these variables. Each variable takes a label from a label set which is a collection of patches of the source region. The quality of the image completion is determined by the value of the energy function. The non-locality in the MRF is achieved through long range pairwise potentials. These long range pairwise potentials are defined to capture the inherent repeating patterns present in heritage architectural images. We minimize this energy function using Belief Propagation to obtain globally optimal image completion.We have tested our method on a wide variety of images and shown superior performance over previously published results for this task.

...read moreread less

Summary (4 min read)

Jump to: [1. INTRODUCTION] – [2. RELATED WORK] – [Statistical Methods.] – [PDE-Based Methods.] – [Exemplar-Based Methods.] – [3. THE IMAGE COMPLETION PROBLEM] – [Data term.] – [Smoothness term.] – [Long range potentials.] – [3.1 Repeating Offset Computation] – [1. Finding Nearest Similar Patches.] – [2. Histogram and Offset Generation.] – [3.2 Graph Construction and inference] – [Inference.] – [4. SUB-MODULARITY AND METRICITY] – [5. EXPERIMENTS AND RESULTS] – [Approximate Nearest Neighbour.] and [6. CONCLUSIONS]

1. INTRODUCTION

Image completion is an important and challenging computer vision task.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
Image completion is an important part of many computer vision applications such as scratch removal, object removal and reconstruction of damaged architectural parts in an image.
The authors also give the reasoning behind long range potentials and define a novel non-local MRF energy function such that its minimum corresponds to the globally optimal image completion.
The authors then discuss experimental settings and present results of proposed method in Section 5.

Statistical Methods.

These methods make use of parametric statistical models for image completion.
These models (wavelet coefficients [13], colour histogram [6]) are used for representation of image characteristics.
Initially, the output image is generated keeping the missing regions as pure noise.
Statistical methods are only useful in case of texture synthesis.
Moreover, they produce blurred outputs for natural images.

PDE-Based Methods.

Partial Differential Equation (PDE) based methods use diffusion process for image completion.
The boundary filling uses diffusion process simulated by solving the partial differentiation equations.
Chan et al. [3] use variational model for region filling.
PDE-based approaches perform well in cases where the missing region is smooth and non-textured.
They fail in case of large inpainting regions.

Exemplar-Based Methods.

Exemplar based techniques have been the most successful approaches in presence of large unknown regions.
Exemplar based techniques have been widely used for image completion recently.
Finally, the greedy approach leads to a bias caused due to selection of a few incorrect patches in the priority based mechanism.
These methods do not take advantage of repeating patterns inherently present in many architectural images.
Contrary to this, the authors incorporate the repetition present in the image by including long-range potentials and find the global minima of the non-local MRF energy.

3. THE IMAGE COMPLETION PROBLEM

Given a source region S and a target region T the image completion problem is to fill the target region such that it agrees with its surroundings.
Xm} where each Xi is a spatial position of size w×h in the image.
Figure 2 shows the formulation of image completion as a labeling problem.
Xj).
(1) Ei(·), Eij(·) and N corresponds to data term, smoothness term and neighborhood system defined inMRF respectively.

Data term.

The data term computation for image completion problem is not straightforward because only boundary sites are visible whereas interior sites are hidden.
(Recall that in any image completion problem user provides a mask which needs to be filled).
Without loss of generality each patch (xis and Lis) can be represented as a vector of size w×h.
The authors define the unary cost of random variable xi taking label Li as follows.
Ei(xi = Li) = l∑ m=0 km(xim − Lim) 2. (2) In other words, data term measures the agreement between random variable xi and label Li in terms of sum of squared distance (ssd) of known pixels.

Smoothness term.

The data term alone cannot give coherent completion.
To enforce coherency in the completed image, the authors define a smoothness term such that overlapping region of neighboring labels have least sum of squared distance.
The authors define the smoothness term as follows.
The process of data and smoothness term computation is pictorially depicted in Figure 3.

Long range potentials.

In addition to the data and smoothness terms, the authors wish to capture the inherent repetitive patterns present in heritage architectural image.
To achieve this, the authors add an extra term in the MRF energy which they call as long range pairwise potentials.
The long range pairwise potential are defined between a patch and its repeating offset at distance τ .
(We describe the repeating offset computation in the next subsection).the authors.
(5) Once the energy is formulated, the problem of image completion becomes equivalent to finding the configuration x∗ corresponding to the global minima of the energy function.

3.1 Repeating Offset Computation

Many archaeological monument images contain repeating patterns.
These repeating patterns vary in complexity which makes image completion a challenging task.
The repetitions can be of any size and along any direction.
The authors use the fact that patches which are part of the repetition will repeat with some common offset.

1. Finding Nearest Similar Patches.

For every patch in the source region of the image the authors find the nearest most similar patch.
The similarity is defined using the sum of squared differences (ssd) between the patches.
This is just to ignore the nearby patches which are likely to be similar but do not contribute towards the repetition offset.
In their testing, θ was set to be 1/15th of the maximum of image height and image width.
Therefore, to overcome the high computation cost, the authors use Approximate Nearest Neighbour(ANN).

2. Histogram and Offset Generation.

Once the authors obtain the offsets corresponding to each pixel in the source region, they need to combine the results in order to obtain the correct repetition offsets.
H(τ ) gives the count of the number of patches having their individual offset as τ .
Now, to get the prominent repetition offsets, the authors analyze the histogram counts of the offsets and select offsets with highest count.
Since the image may contain many repetitive patterns which are prominent, thus the authors generate the top C offsets to capture varied repetitions (C = 10 was used in their experiments).

3.2 Graph Construction and inference

The authors solve the energy minimization problem on a corresponding graph, where each random variable is represented as a node in the graph.
To capture repeating patterns in the image, the authors also join non-local nodes at offset τ .
The authors further group nodes in this graph into two categories: visible and hidden.
The cost of a node taking some label Li is determined by the unary cost defined in Equation 2. Similar to [10], if a node is highly likely to take some label, the authors declare that node “committed” and give higher priority to it for sub-sequent inference procedure.

Inference.

In their experiments, the patch size is set dynamically as per the image resolution and aspect ratio with the minimum dimension of 4×.
For all the examples, the belief thresholds for pruning and confidence is set to −2ssd0 and −ssd0 respectively, where −ssd0 represents a predefined mediocre ssd between the patches.
Figure 7 shows the results of object removal using their method.
Apart from object removal and ruined wall reconstruction the authors also use their method for an interesting application known as background replacement.
The authors also study the importance of long range potentials.

4. SUB-MODULARITY AND METRICITY

The authors prove that the non-local MRF energy function defined in Equation 5 is sub-modular and semimetric.
Further, since the sum of sub-modular functions is a sub-modular function [16], the energy function defined in Equation 5 is a sub-modular energy function for every pair of labels.
The energy function defined in Equation 5 is basically composed of sum of squared distance (ssd) between two vectors, thus it would be sufficient to prove ssd as a semi-metric.
Then, since Euclidean distance holds triangular in-equality, the authors can write.
The proof of sub-modularity and semi-metricity of the energy functions also guarantees that popular move making algorithm α-β swap can be efficiently used to find the global minima of this energy with a constant approximation [2].

5. EXPERIMENTS AND RESULTS

The authors present a detailed evaluation of their method on a large collection of images captured from Indian heritage sites.
To show the generality of the method, the authors also include few synthetic images and natural images in their test datasets.
Given an image and user provided mask, their problem is to complete the masked region in a way that is visually plausible to observer.
The authors evaluate various components of their approach to justify their choices.
The dataset for their experiments comprises of a large variety of images of Indian Heritage sites including Hampi, Konark, Golkonda Fort etc.

Approximate Nearest Neighbour.

In the process of repetition offset computation, the authors use Approximate Nearest Neighbour1 technique in order to find the most similar patches.
For a resolution of 100 × 100, a brute force method takes around 2 minutes to process the entire image and generate the offsets.
With ANN, the time is reduced to 0.1 seconds.
The threshold radius (θ) is set to 1/15th of the maximum of the image width and height.
C = 10 most frequent offsets are chosen for their experiments.

6. CONCLUSIONS

In this work the authors address the problem of image completion.
The image completion problem is formulated in a principled framework.
The authors model the repeating patterns inherently present in images using long range potentials and solve the problem in non-local MRF framework.
The authors prove that the proposed MRF energy is sub-modular and semi-metric.
Experimental results on a wide collection of images show that the authors clearly outperform popular technique like exemplar based inpainting [4].

Did you find this useful? Give us your feedback

Figures (11)

Figure 1: Many successful applications of our method. (a) Object removal: the signboard on the window is successfully removed. (b) Reconstruction: the originally broken Colosseum has been successfully reconstructed using our approach.

Figure 8: Successful reconstruction of ruined walls of Golkonda fort. (a) The original image with selected mask. (b) Output of exemplar based [4] inpainting mask. (c) Output of proposed approach.

Figure 7: Successful object removal. (a) The original image with selected mask. (b) Output of exemplar based inpainting mask. (c) Output of proposed approach. We see that the exemplar based approach [4] being local is not able to synthesize the missing region effectively whereas ours generates results with better coherency by performing a global optimization.

Figure 3: Data term and smoothness term computations. Data term is agreement from the labels to the node in terms of ssd. Only visible area contributes to ssd. Smoothness term is computed based on ssd in overlapping regions of labels. (note that labels here are basically a collection of patches) (Best viewed in colour).

Figure 2: Image Completion as a labeling problem. Overlapping patch positions in image and patches sampled from source region represent sites and labels respectively. The labeling problem here is to find the optimal labeling from sites to labels.

Figure 11: Failure case. (a) The original image with selected mask. (b) Output of exemplar based approach (c) Output of proposed approach. The current algorithm does not take care of patterns having a rotation relationship and hence fails for the above case.

Figure 10: Background replacement. (a) An Indian lady in front of Taj Mahal. The background is marked by the user. (b) A natural scene image taken as source region S for image completion. (c) The background of the Taj Mahal is replaced by natural scene by the proposed method.

Figure 9: Importance of long range potentials. (a) The original image with selected mask. (b) Output image without long range potentials. (c) Output of proposed approach(with long range potentials). (d-e) The regions of interest (shown in yellow rectangles) are zoomed (Best viewed in colour).

Figure 4: Many architectural images contain repeating patterns. One such example is shown here (Best viewed in colour).

Figure 5: The proposed graphical model. There are two types of nodes in the graph: visible (in boundary and source region) and hidden (in interior region of target). Hidden nodes are shown by filled circles. Local pairwise potentials are shown via red edges and non-local long range potentials are shown via green edges. We use loopy belief propagation for inference in this graphical model (Best viewed in colour).

Figure 6: Overview of our method. First user selects a mask. Based on user selected mask a graph is constructed where each node represent an overlapping spatial position in the image. These nodes are connected via a 4-neighborhood system N . Moreover, to capture inherent repeatability, two nodes at distance of repeating offset are also connected (these edges are shown in yellow colour). To find repeating pattern in the image we use approximate nearest search (ANN). After graph construction, graph is labeled using popular inferencing technique: BP. The final output of our method is shown in the right most image (Best viewed in colour).

Content maybe subject to copyright Report

A Non-local MRF model for Heritage Architectural

Image Completion

Deepan Gupta

∗

Vaidehi Chhajer

∗

Anand Mishra

†

C. V. Jawahar

‡

Center for Visual Information Technology, IIIT Hyderabad, India

http://cvit.iiit.ac.in/

ABSTRACT

MRF mod els have shown state-of-the-art performance for

many computer vision tasks. In this work, we propose a

non-local MRF model for image completion problem. The

goal of image completion is to ﬁll user speciﬁed “target” re-

gion with patches of“source” regions in a way th at is visually

plausible to an observer. We represent the patches in the tar-

get region of the image as random variables in an MRF, and

introduce a novel energy function on these variables. Each

variable takes a label from a label set which is a collection

of patches of the source region. The quality of the image

completion is determined by the value of the energy func-

tion. The non-locality in the MRF is achieved through long

range pairwise potentials. These long range pairwise poten-

tials are deﬁned to capture the inherent repeating patterns

present in heritage architectural images. We minimize this

energy function using Belief Propagation to obtain globally

optimal image completion.

We have tested our method on a wide variety of images

and shown superior performance over previously published

results for this task.

Keywords

Inpainting, MRF, Belief Propagation

1. INTRODUCTION

Image completion is an important and challenging com-

puter vision task. The goal of an image completion algo-

rithm is to reconstru ct the missing regions within an image

in a way that is visually plausible to an observer. In most

cases, the missing region (called the target region) is ﬁlled in

by using the information from the rest of the image (called

∗

Equal contribution

∗

{deepan.gupta,vaidehi.chhajer}@students.iiit.ac.in

† anand.mishra@research.iiit.ac.in

‡ jawahar@iiit.ac.in

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

ICVGIP ’12, December 16-19, 2012, Mumbai, India

(a) Object Removal

(b) Reconstruction

Figure 1: Many successful applications of our

method. (a) Object remova l: the signboard on the

window is successfully removed. (b) Reconstruction:

the originally broken Colosseum has been success-

fully reconstructed using our approach.

the source region). Image completion is an important part

of many computer vision applications such as scratch re-

moval, object removal and reconstruction of damaged archi-

tectural parts in an image. Moreover, image completion has

applications in the ﬁeld of photo editing and restoration.

Image completion is a highly researched area in computer

vision [4, 5, 7, 8, 10, 11]. Although due to the complex-

ity of the images, results leave a lot to be desired. In this

work, we propose a non-local MRF technique to complete

images. (Note that non-local MRF [15] has been success-

fully applied to image restoration in past). The beauty of

our formulation is in its capability to use non-local repeating

patterns in MRF energy minimization framework. We use

Belief Propagation [12] to ﬁnd the minimum of this en ergy

i.e., the optimal image completion. Few successful appli-

cations of our method are shown in Figure 1. It can be

seen that our method performs well for the tasks like object

removal, reconstruction etc.

India has one of the greatest architectu ral monuments in

the world. With advances in computer vision techniques,

it is now possible to capture the glory of heritage architec-

tural images for both showcasing and preservation for future

generations. Our work can be considered as a small step to-

wards this. We primarily focus on completion of images

taken from Indian heritage architectures. (Although we also

test our method on many other images to show the general-

ity of the method).

Contributions. The contribution of this work is two

fold, (1) We propose a non-local op timization framework to

capture repeating patterns inherently present in the image

(especially, the images of our interest i.e. heritage architec-

tural images). We model these repetitions via long range

potentials in MRF. (2) We prove that the proposed energy

function is sub-mo dular as well as semi-metric. This proof

guarantees that the energy function can be eﬃciently min-

imized via move making algorithm like α-β swap [2]. How-

ever, the study of energy minimization techniques is beyond

the scope of this work. H ence, we restrict ourself to Belief

Propagation for the implementation of our method.

Outline of the paper. The remainder of the paper is or-

ganized as follows. We discuss related work in Section 2. In

Section 3, the image completion problem is formulated as a

labeling problem. In this section, we also give the reasoning

behind long range potentials and deﬁne a novel non- local

MRF energy funct ion such t hat its minimum correspond s

to the globally optimal image completion. In Section 4, we

prove that the proposed non-lo cal MRF energy function is

sub-modular and semi-metric. We then discuss experimental

settings and present results of proposed method in Section 5.

Finally, Section 6 concludes our work.

2. RELATED WORK

There has been signiﬁcant research in the ﬁeld of image

completion. Various approaches have been put forward over

the last few decades. The approaches can be grouped into

four major categories: (1) Statistical Methods, (2) Partial

Diﬀerentiation Equation (PDE)-Based Methods, (3) Exem-

plar Based Methods, (4) Global Op timization based Meth-

ods.

Statistical Methods.

These methods make use of parametric statistical mod-

els for image completion. These models (wavelet coeﬃ-

cients [13], colour histogram [6]) are used for representation

of image characteristics. The idea is to estimate th e miss-

ing region and ﬁll it using an iterative process. Initially, the

output image is generated keeping the missing regions as

pure noise. These regions u ndergo iterative noise reduction

to produce the ﬁnal output. Statistical m ethod s are only

useful in case of texture synthesis. Moreover, they produce

blurred outputs for natural images.

PDE-Based Methods.

Partial Diﬀerential Equation (PDE) based methods use

diﬀusion process for image completion. The idea is to start

the region-ﬁlling from t he boundary of the missing region

and then propagate towards the interior. The boundary

ﬁlling u ses diﬀusion process simulated by solving the par-

tial diﬀerentiation equations. In [1] region-ﬁlling is done

by propagating image Laplacians in the direction of the

isophotes. Chan et al. [3] use variational model for region

ﬁlling.

PDE-based approaches perform well in cases where the

missing region is smooth and non-textured. However, they

fail in case of large inpainting regions.

Exemplar-Based Methods.

Exemplar based techniques have been the most su ccessful

approaches in presence of large unknown regions. The visi-

ble patches of the image are used as a training set to infer

the unknown parts which are then ﬁlled by simply copy-

ing the content of these known patches. Exemplar based

techniques have been widely used for image completion re-

cently. Criminisi et al. [4] propose a priority-based mecha-

nism which combines texture synthesis and isophote driven

inpainting for image completion. This ap proach, though

isophote driven, is not capable of maintaining the structural

consistency of the image. Hung et al. [8] propose Bezier

curves to determine missing edge information, h ence pre-

serving structure consistency. The damaged regions are then

inpainted using exemplar based methods. However, there

are three major pitfalls of these methods. Firstly, the con-

ﬁdence map is computed based on heuristics and may not

be applicable to a general case. Secondly, once a patch has

been assigned to an unknown region, it cannot be changed.

Finally, the greedy approach leads to a bias caused due to

selection of a few incorrect patches in the priority based

mechanism. These incorrect completions have a spiraling

eﬀect which destabilizes the inpainting process.

Global Optimization based methods.

There has been huge interest in the discrete optimization

community for image completion problem in recent years [5,

14]. However, these methods do not take advantage of re-

peating patterns inherently present in many architectural

images. On other hand, we model the repetitions in the en-

ergy function itself and thus deﬁne a better energy function

for the problem. Closest to our work is [10]. Here authors try

to tackle the drawbacks of the Exemplar-based approach by

posing image-ﬁlling as a discrete global optimization prob-

lem. It uses the exemplar-based framework and Markov

Random Field (MRF) for image completion. The idea is to

minimize the energy of the MRF using Priority-Belief Prop-

agation (Priority-BP) optimization scheme. The approach

works well for majority of the cases. However, in the im-

ages where repetitions are prominent, it produces relatively

poor output. The method ignores the fact that repetition,

if present, may carry critical information about the miss-

ing region. Contrary to this, we incorporate the repetition

present in the image by inclu ding long-range potentials and

ﬁnd the global min ima of the non-local MRF energy.

3. THE IMAGE COMPLETION PROBLEM

Given a source region S and a target region T the image

completion problem is to ﬁll the target region such that it

agrees with its surroundings. We deﬁne th e image comple-

tion problem in a labeling problem framework where over-

lapping spatial positions in image can be considered as a set

of sites and patches of size w × h sampled from source re-

gion can be considered as labels. In other words, site is a set

S = {X

, X

. . . X

} where each X

is a spatial position of

size w×h in the image. Similarly, label L = {L

, L

, . . . , L

}

is a set where each L

is a patch of size w × h sampled from

source region S. Figure 2 shows the formulation of image

completion as a labeling problem.

Each site X

can take a random value x

= {L

, . . . , L

The labeling problem here is to ﬁnd the optimal function

∗

: S → L. Optimality criteria is deﬁned based on qual-

ity of the image completion. In general, this is an NP-hard

problem. However it can be solved approximately by ﬁnd-

ing the minimum of the Gibbs energy (also known as MRF

energy) of following form:

E(x) =

i=1

) +

, x

). (1)

(·), E

(·) and N corresponds to data term, smoothness

term and neighb orh ood system deﬁned in MRF respectively.

The data term measures the agreement with the available

observations whereas the smoothn ess term is used to enforce

spatial coherence. The minimum of this energy function

corresponds to the op timal image completion.

Data term.

The data term computation for image completion prob-

lem is not straightforward because only boundary sites are

visible whereas interior sites are hidden. (Recall that in any

image completion problem user provides a mask which need s

to be ﬁlled). Without loss of generality each patch (x

s and

s) can be represented as a vector of size w×h. i.e. patches

can be represent as a vector in a vector space as follows.

= [x

, . . . , x

]

= [L

, . . . , L

]

where l = w×h and each x

is either hidden or in [0, 255]

Whereas L

∈ [0, 255]

, ∀i. To distinguish visible and hid-

den nodes, we introduce a binary vector K = {k

, k

, . . . , k

}

such that k

takes value 0 if x

is hidden and 1 otherwise.

We deﬁne the unary cost of ran dom variable x

taking label

as follows.

= L

) =

m=0

− L

)

. (2)

In other words, data term measures the agreement between

random variable x

and label L

in t erms of sum of squared

distance (ssd) of known pixels. Thus, th e cost of x

taking

label L

is low if the sum of squared distance (ssd) between

and L

is low.

Smoothness term.

The data term alone cannot give coherent completion.

To enforce coherency in the completed image, we deﬁne a

smoothn ess term such that overlapping region of neighbor-

ing labels have least sum of squared distance. We deﬁne the

smoothn ess term as follows.

= L

, x

= L

)

size(ψ)

m=0

δ(X

∈ ψ) ∧ δ(X

∈ ψ)(L

− L

)

. (3)

Here ψ is the overlapping region between sites (i.e. patches)

and X

. δ(·) is an indicator function. The process of data

and smoothness term comput ation is pictorially depicted in

Figure 3.

Figure 2: Image Completion as a labeling problem.

Overlapping patch posi tions in image and patches

sampled from source region represent sites and la-

bels respectively. The labeling problem here is to

ﬁnd the optimal labeling from sites to labels.

Figure 3: Data term and smoothness term computa-

tions. Data term is agreement from the la bels to the

node in terms of ssd. Only vi sible area contributes to

ssd. Smoothness term is computed based on ssd in

overlapping regions of labels. (note that labels here

are basically a collection of patches) (Best viewed in

colour).

Long range potentials.

In addition to the data and smoothness terms, we wish to

capture the inherent repetitive patt erns present in heritage

architectural image. To achieve this, we add an extra term

in the MRF energy which we call as long range pairwise

potentials. The long range pairwise potential are deﬁned

between a patch and its repeating oﬀset at d istance τ . (We

describe the repeating oﬀset computation in the next subsec-

tion). This long range potential forces a node to take similar

label to a patch at oﬀset τ . Mathematically, the long range

potential E

(·, ·) is deﬁned as follows.

= L

, x

= L

) =

m=1

− L

)

. (4)

Note that here x

and x

are non-local i.e. they are at

distance τ . (Here τ is a repeating oﬀset, in other words the

image has a repeating pattern at oﬀset τ ). This deﬁnition

of long range potential ensures less penalty if x

and x

take

similar labels.

Thus we modify Equation 1 to non-local MRF energy as

Figure 4: Many architectural i mages contain repeat-

ing patterns. One such example is shown here (Best

viewed in colour).

follows.

E(x) =

, x

dist(x

)=τ

, x

(5)

Once the energy is formulated, the problem of image com-

pletion becomes equivalent to ﬁnding the conﬁguration x

∗

correspondin g to the global minima of the energy function.

The graph construction corresponding t o this energy func-

tion and the inference (energy minimization) are d iscussed

in Section 3.2.

3.1 Repeating Offset Computation

Many archaeological monument images contain repeating

patterns. One su ch example is shown in Figure 4. These

repeating patterns vary in complexity which makes image

completion a challenging task. The repetitive pattern may

carry signiﬁcant information about the region which is to be

completed (inpainted). Hence, t he idea is t o make use of

various repetitions present in the input image and boost the

region-ﬁlling.

The repetitions can be of any size and along any direction.

In order to capture both we make use of (p, q) oﬀset (τ ,

repeating oﬀset) pairs which correspond to the x-direction

and y-direction repetition oﬀsets respectively.

In this step, we compute oﬀsets that can eﬀectively rep-

resent the inherent repetition in the image. We use the fact

that patches which are part of the repetition will repeat with

some common oﬀset. The distance between a patch and its

nearest similar patch will account for the repetition oﬀset.

Oﬀset Generation consists of following steps.

1. Finding Nearest Similar Patches.

For every patch in the source region of t he image we ﬁnd

the nearest most similar patch. For every patch P belonging

to the source region,

τ(x) = arg min

||P (x) − P (x + τ )||

; |τ| > θ.

Here τ is a 2D-coordinate oﬀset (p, q), P (x) is the patch

centered at x. P (x + τ(x)) is the nearest similar patch. τ

represents the oﬀset obtained for th e pixel x. The simi-

larity is deﬁned using the sum of squared diﬀerences (ssd)

between the patches. Lesser ssd correspond s to higher sim-

ilarity. The parameter θ represents the th reshold radius for

the nearest patch. The patch must lie out side this range.

Figure 5: The proposed graphical model. There are

two types of nodes in the graph: visible (in boundary

and source regi on) and hidden (in interior region of

target). Hidden nodes are shown by ﬁlled circles.

Local pairwise potentials are s hown via red edges

and non-local long range potentials are shown via

green edges. We use loopy belief propagation for

inference in this graphical model (Best v iewed in

colour).

This is just to ignore the n earby patches which are likely

to be similar but do not contribute towards the repetition

oﬀset. This threshold varies with the images based on their

sizes. In our testing, θ was set to be 1/15

of th e maximum

of image height and image width.

The brute force search for the nearest patch can be compu-

tationally expensive as for each patch we need to traverse en-

tire source region. Therefore, to overcome the high compu-

tation cost, we use Approximate Nearest N eighb ou r(ANN).

Using approximate nearest neighbour,

τ(x) = AN N (P (x)); |τ| > θ.

2. Histogram and Offset Generation.

Once we obtain the oﬀsets corresponding to each pixel in

the source region, we need to combine the results in order

to obtain the correct repetition oﬀsets. To achieve this we

represent τ as a 2D-plane and generate histogram count of

the all τ oﬀsets i.e.

H(τ ) =

τ (x)

δ(τ (x) = τ).

H(τ ) gives the count of the number of patches having

their individual oﬀset as τ . Now, to get the prominent repe-

tition oﬀsets, we analyze the histogram counts of the oﬀsets

and select oﬀsets with highest count. Since the image may

contain many repetitive patterns which are prominent, thus

we generate the top C oﬀsets to capture varied repetitions

(C = 10 was used in our experiments).

3.2 Graph Construction and inference

We solve the energy minimization problem on a corre-

sp on ding graph, where each random variable is represented

as a node in the graph. Nodes in 4-neighborhood system are

connected via edges. To capture repeating patterns in the

image, we also join non-local nodes at oﬀset τ. (This oﬀset

is computed based on the procedure described Section 3.1).

We further group nodes in this graph into two categories:

visible and hidden. The nodes belonging to source region

Figure 6: Overview of our method. First user selects a mask. Based on user selected mask a graph is con-

structed where each no de represent an overlapping spatial position in the image. These nodes are connected

via a 4-neighborhood system N . Moreover, to capture inherent repeatability, two nodes at distance of re-

peating oﬀset are also connected (these edges are shown in yellow colour). To ﬁnd repeating pattern in the

image we use approximate nearest search (ANN). After graph construction, graph is labeled using popula r

inferencing technique: BP. The ﬁnal output of our method is shown i n the right most image (Best viewed in

colour).

and boundary of target region are visible, however interior

nodes of th e target regions are hidden. Each node takes a

label from label set L = {L

, L

, . . . , L

} where each L

is a

patch of size w × h. The cost of a node taking some label L

is determined by the unary cost deﬁned in Equation 2. Fur-

ther, the joint cost of two neighboring and non-local nodes

taking label L

and L

deﬁned in Equation 3 and 4 respec-

tively, give th e weight s to edges. Similar to [10], if a node is

highly likely to take some label, we declare that node “com-

mitted” and give higher priority to it for sub-sequent infer-

ence procedure. The proposed graphical model is shown and

explained in Figure 5.

Inference.

For Inference of the proposed graphical model, we use pop-

ular message passing based inferencing algorithm known as

loopy Belief propagation (BP). Belief propagation was ﬁrst

proposed in [12]. I t iteratively tries to ﬁnd the Maximum-

a-Posteriori (MAP) estimate by propagating messages ( be-

liefs) from nodes t o its neighbors. (Recall t hat in MAP-

MRF framework MAP is equivalent to the global minima

of the MRF energy). Although theoretically loopy BP does

not guarantee convergence for grids, but experimentally it

has been shown that it yields a strong local minima for a

wide range of computer vision problems [17].

The proposed meth od is summarized in Figure 6.

4. SUB-MODULARITY AND METRICITY

In this section, we prove that the non-local MRF energy

function deﬁned in Equation 5 is sub-modular and semi-

metric.

Statement 1. The energy function deﬁned in Equation 5

is a sub-modular function for every pair of labels.

Proof. A function of single variable is trivially a sub-

mod ular function [9]. Thus, it would suﬃce if we prove that

the pairwise terms E

(·, ·) and E

(·, ·) are sub-modular for

every pair of labels. To proof the sub-modularity, we need

to prove the following:

E(L

, L

) ≤

i6=j

E(L

, L

), ∀i, j.

Since E(·, ·) is a sum of squared distance between two vec-

tors, thus E(L

, L

) = 0,∀i. Moreover, sum of squared dis-

tance between any two not-equal vectors is always positive,

which implies, E(·, ·) is a sub-modular function. This proof

of sub-mod ularity can be easily extended to long range po-

tentials without loss of generality. Further, since the sum

of sub-modular functions is a su b-mo dular function [16], the

energy function deﬁned in Equation 5 is a sub-mod ular en-

ergy function for every pair of labels.

Statement 2. The energy function deﬁned in Equation 5

is a semi-metric.

Proof. The energy function deﬁned in Equation 5 is ba-

sically composed of sum of squared distance (ssd) between

two vectors, thus it would be suﬃcient to prove ssd as a

semi-metric. Here we show that the sum of squared distance

(ssd) has all th e three necessary and suﬃcient properties to

be a semi-metric. We also show that ssd does not hold tri-

angular inequality always, thus is not a metric.

1.Non-negativity: ssd between two vector is alway s greater

than zero i.e.

ssd(L

, L

) ≥ 0, ∀i 6= j.

2.Identity of indiscernibles: ssd between two vector is equal

to zero iﬀ both the vectors are equ al, i.e.,

ssd(L

, L

) = 0 ⇐⇒ i = j, ∀i, j.

3.Symmetricity: ssd between two vector is a symmetric

function,i.e.

ssd(L

, L

) = ssd(L

, L

), ∀i, j.

4.Triangular in-equality: In order to attempt to prove this

let us ﬁrst start with the Euclidean distance between two

vectors. Let dist(L

, L

) be Euclidean distance between vec-

tors L

and L

. Then, since Euclidean distance holds trian-

gular in-equality, we can write.

dist(L

, L

) ≤ dist(L

, L

) + dist(L

, L

), ∀k.

Squaring ab ove equation yields,

ssd(L

, L

) ≤ ssd(L

, L

) + ssd(L

, L

2 dist(L

, L

)dist(L

, L

Recall that ssd is a square of Euclidean distance. Now

since Euclidean distances are non-negative, i.e. 2 dist(L

, L

)

dist(L

, L

) ≥ 0, in other words we can always ﬁnd L

, L

and L

such that,

HTML Viewer

Frequently Asked Questions (13)

Q1. What are the contributions in "A non-local mrf model for heritage architectural image completion" ?

In this work, the authors propose a non-local MRF model for image completion problem. The authors represent the patches in the target region of the image as random variables in an MRF, and introduce a novel energy function on these variables. The authors have tested their method on a wide variety of images and shown superior performance over previously published results for this task. The non-locality in the MRF is achieved through long range pairwise potentials. These long range pairwise potentials are defined to capture the inherent repeating patterns present in heritage architectural images.

Q2. What is the main purpose of the method?

Apart from object removal and ruined wall reconstruction the authors also use their method for an interesting application known as background replacement.

Q3. How many ssds are used in the example?

For all the examples, the belief thresholds for pruning and confidence is set to −2ssd0 and −ssd0 respectively, where −ssd0 represents a predefined mediocre ssd between the patches.

Q4. what is the ssd between two vectors?

Identity of indiscernibles: ssd between two vector is equal to zero iff both the vectors are equal, i.e.,ssd(Li, Lj) = 0 ⇐⇒ i = j, ∀i, j.3.

Q5. What is the proof of sub-modularity and semi-metricity of the energy functions?

The proof of sub-modularity and semi-metricity of the energy functions also guarantees that popular move making algorithm α-β swap can be efficiently used to find the global minima of this energy with a constant approximation [2].

Q6. What is the definition of a smoothness term?

To enforce coherency in the completed image, the authors define a smoothness term such that overlapping region of neighboring labels have least sum of squared distance.

Q7. How do the authors solve the energy minimization problem?

The authors solve the energy minimization problem on a corresponding graph, where each random variable is represented as a node in the graph.

Q8. How many repetitions are used in the graph?

Since the image may contain many repetitive patterns which are prominent, thus the authors generate the top C offsets to capture varied repetitions (C = 10 was used in their experiments).

Q9. What is the definition of the image completion problem?

The authors define the image completion problem in a labeling problem framework where overlapping spatial positions in image can be considered as a set of sites and patches of size w × h sampled from source region can be considered as labels.

Q10. What is the cost of xi taking label Li?

Ei(xi = Li) = l∑m=0km(xim − Lim) 2. (2)In other words, data term measures the agreement between random variable xi and label Li in terms of sum of squared distance (ssd) of known pixels.

Q11. How do the authors find the similar patches?

In the process of repetition offset computation, the authors use Approximate Nearest Neighbour1 technique in order to find the most similar patches.

Q12. What are the datasets for their experiments?

The dataset for their experiments comprises of a large variety of images of Indian Heritage sites including Hampi, Konark, Golkonda Fort etc.

Q13. What is the definition of the labeling problem?

The labeling problem here is to find the optimal function f∗ : S → L. Optimality criteria is defined based on quality of the image completion.

A non-local MRF model for heritage architectural image completion

Summary (4 min read)

1. INTRODUCTION

2. RELATED WORK

Statistical Methods.

PDE-Based Methods.

Exemplar-Based Methods.

3. THE IMAGE COMPLETION PROBLEM

Data term.

Smoothness term.

Long range potentials.

3.1 Repeating Offset Computation

1. Finding Nearest Similar Patches.

2. Histogram and Offset Generation.

3.2 Graph Construction and inference

Inference.

4. SUB-MODULARITY AND METRICITY

5. EXPERIMENTS AND RESULTS

Approximate Nearest Neighbour.

6. CONCLUSIONS

Figures (11)

Citations

Cites background from "A non-local MRF model for heritage ..."

References

"A non-local MRF model for heritage ..." refers background or methods in this paper

"A non-local MRF model for heritage ..." refers background in this paper

"A non-local MRF model for heritage ..." refers background or methods in this paper

"A non-local MRF model for heritage ..." refers methods in this paper

Related Papers (5)

Frequently Asked Questions (13)

Q1. What are the contributions in "A non-local mrf model for heritage architectural image completion" ?

Q2. What is the main purpose of the method?

Q3. How many ssds are used in the example?

Q4. what is the ssd between two vectors?

Q5. What is the proof of sub-modularity and semi-metricity of the energy functions?

Q6. What is the definition of a smoothness term?

Q7. How do the authors solve the energy minimization problem?

Q8. How many repetitions are used in the graph?

Q9. What is the definition of the image completion problem?

Q10. What is the cost of xi taking label Li?

Q11. How do the authors find the similar patches?

Q12. What are the datasets for their experiments?

Q13. What is the definition of the labeling problem?