What have the authors contributed in "Intelligent scissors for image composition" ?

The authors present a new, interactive tool called Intelligent Scissors which they use for image segmentation and composition. Robustness is further enhanced with on-the-fly training which causes the boundary to adhere to the specific type of edge currently being followed, rather than simply the strongest edge in the neighborhood.

What is the purpose of equivalencing of spatial frequencies?

Equivalencing of spatial frequencies is performed by matching the spectral content of the cut piece and the destination image in the vicinity where it is to be pasted.

What are the important extensions of the work?

There are many rich extensions of this work, including: (1) making use of the weighted zero-crossings in the Laplacian to perform subpixel edge filtering and anti-aliasing, (2) use of multiple layered (multiplane) masks, (3) making spatial frequency equivalencing locally adaptive, (4) varying the light source over the object using directional gradient shading (artificial or borrowed) to provide consistent lighting in the composition, and, most importantly (5) extension of the 2-D DP graph search and application of the live-wire snap and training tools to moving objects and moving, multiplane masks for composition of image sequences.

What is the way to create a composition mask?

This requires the composition artist to “slip” the cutout object behind some scene components while leaving it in front of other components.

What is the difference between the livewire tool and the previous optimal contours?

the live-wire tool is much more similar to previous stage-wise optimal boundary tracking approaches than it is to snakes, since Intelligent Scissors were developed as an interactive 2-D extension to previous optimal edge tracking methods rather than an improvement on active contours.

How can The authorextend the gradient direction term without significant loss of computational efficiency?

it is possible to extend the gradient direction term to include 3 pixels and 2 links without significant loss of computational efficiency.

How can the object edge be estimated to subpixel accuracy?

That is, the position of the object edge can be estimated to subpixel accuracy by using a (linearly) weighted combination of the laplacian pixel values on either side of the zero-crossings.

(Open Access) Intelligent scissors for image composition (1995) | Eric N. Mortensen

Q: What is the effect of cooling on the live-wire boundary?

The longer a pixel is on a stable section of the live-wire boundary, the cooler it becomes until it eventually freezes and automatically produces a new seed point.

Q: How can training be used to train objects?

training can be turned on and off interactively throughout the definition of an object boundary so that it can be used (if needed) in a section of the boundary with similar edge characteristics and then turned off before a drastic change occurs.

Q: How many seed points are required to ensure a closed object boundary?

Since each pixel (or free point) defines only one optimal path to a seed point, a minimum of two seed points must be placed to ensure a closed object boundary.

Q: What is the effect of the formulation of boundary finding as a 2-D graph search?

formulation of boundary finding as a 2-D graph search eliminates the directed sampling and searching restrictions of previous implementations, thereby allowing boundaries of arbitrary com-G The authorx 2 The authory 2+=f Gm a x G( ) G− m a x G( )1

Q: What is the difference between training and a live-wire?

Since training is based on learned edge characteristics from the most recent portion of an object’s boundary, training is most effective for those objects with edge properties that are relatively consistent along the object boundary (or, if changing, at least change smoothly enough for the training algorithm to adapt).

Abstract

We present a new, interactive tool called Intelligent Scissors

which we use for image segmentation and composition. Fully auto-

mated segmentation is an unsolved problem, while manual tracing

is inaccurate and laboriously unacceptable. However, Intelligent

Scissors allow objects within digital images to be extracted quickly

and accurately using simple gesture motions with a mouse. When

the gestured mouse position comes in proximity to an object edge,

a live-wire boundary “snaps” to, and wraps around the object of

interest.

Live-wire boundary detection formulates discrete dynamic pro-

gramming (DP) as a two-dimensional graph searching problem. DP

provides mathematically optimal boundaries while greatly reducing

sensitivity to local noise or other intervening structures. Robust-

ness is further enhanced with on-the-ﬂy training which causes the

boundary to adhere to the speciﬁc type of edge currently being fol-

lowed, rather than simply the strongest edge in the neighborhood.

Boundary cooling automatically freezes unchanging segments and

automates input of additional seed points. Cooling also allows the

user to be much more free with the gesture path, thereby increasing

the efﬁciency and ﬁnesse with which boundaries can be extracted.

Extracted objects can be scaled, rotated, and composited using

live-wire masks and spatial frequency equivalencing. Frequency

equivalencing is performed by applying a Butterworth ﬁlter which

matches the lowest frequency spectra to all other image compo-

nents. Intelligent Scissors allow creation of convincing composi-

tions from existing images while dramatically increasing the speed

and precision with which objects can be extracted.

1. Introduction

Digital image composition has recently received much attention

for special effects in movies and in a variety of desktop applica-

tions. In movies, image composition, combined with other digital

manipulation techniques, has also been used to realistically blend

old ﬁlm into a new script. The goal of image composition is to com-

bine objects or regions from various still photographs or movie

frames to create a seamless, believable, image or image sequence

which appears convincing and real. Fig. 9(d) shows a believable

composition created by combining objects extracted from three

images, Fig. 9(a-c). These objects were digitally extracted and

combined in a few minutes using a new, interactive tool calledIntel-

ligent Scissors.

When using existing images, objects of interest must be extracted

and segmented from a surrounding background of unpredictable

complexity. Manual segmentation is tedious and time consuming,

lacking in precision, and impractical when applied to long image

sequences. Further, due to the wide variety of image types and con-

tent, most current computer based segmentation techniques are

slow, inaccurate, and require signiﬁcant user input to initialize or

control the segmentation process.

This paper describes a new, interactive, digital image segmenta-

tion tool called “Intelligent Scissors” which allows rapid object

extraction from arbitrarily complex backgrounds. Intelligent Scis-

sors boundary detection formulates discrete dynamic programming

(DP) as a two-dimensional graph searching problem. Presented as

part of this tool are boundary cooling and on-the-ﬂy training, which

reduce user input and dynamically adapt the tool to speciﬁc types of

edges. Finally, we present live-wire masking and spatial frequency

equivalencing for convincing image compositions.

2. Background

Digital image segmentation techniques are used to extract image

components from their surrounding natural background. However,

currently available computer based segmentation tools are typically

primitive and often offer little more advantage than manual tracing.

Region based magic wands, provided in many desktop applica-

tions, use an interactively selected seed point to “grow” a region by

adding adjacent neighboring pixels. Since this type of region grow-

ing does not provide interactive visual feedback, resulting region

boundaries must usually be edited or modiﬁed.

Other popular boundary deﬁnition methods use active contours

or snakes[1, 5, 8, 15] to improve a manually entered rough approx-

imation. After being initialized with a rough boundary approxima-

tion, snakes iteratively adjust the boundary points in parallel in an

attempt to minimize an energy functional and achieve an optimal

boundary. The energy functional is a combination of internal

forces, such as boundary curvature, and external forces, like image

gradient magnitude. Snakes can track frame-to-frame boundary

motion provided the boundary hasn’t moved drastically. However,

active contours follow a pattern of initialization followed by energy

minimization; as a result, the user does not know what the ﬁnal

boundary will look like when the rough approximation is input. If

the resulting boundary is not satisfactory, the process must be

repeated or the boundary must be manually edited. We provide a

detailed comparison of snakes and Intelligent Scissors in section

3.6.

Another class of image segmentation techniques use a graph

searching formulation of DP (or similar concepts) to ﬁnd globally

optimal boundaries [2, 4, 10, 11, 14]. These techniques differ from

snakes in that boundary points are generated in a stage-wise optimal

cost fashion whereas snakes iteratively minimize an energy func-

tional for all points on a contour in parallel (giving the appearance

of wiggling). However, like snakes, these graph searching tech-

niques typically require a boundary template--in the form of a man-

ually entered rough approximation, a ﬁgure of merit, etc.--which is

used to impose directional sampling and/or searching constraints.

This limits these techniques to a boundary search with one degree

of freedom within a window about the two-dimensional boundary

template. Thus, boundary extraction using previous graph search-

ing techniques is non-interactive (beyond template speciﬁcation),

losing the beneﬁts of further human guidance and expertise.

Intelligent Scissors for Image Composition

Eric N. Mortensen

William A. Barrett

Brigham Young University

enm@cs.byu.edu, Dept. of Comp. Sci., BYU, Provo, UT 84602 (801)378-7605

barrett@cs.byu.edu, Dept. of Comp. Sci., BYU, Provo, UT 84602 (801)378-7430

Permission to make digital/hard copy of part or all of this work

for personal or classroom use is granted without fee provided

that copies are not made or distributed for profit or commercial

advantage, the copyright notice, the title of the publication and

its date appear, and notice is given that copying is by permission

of ACM, Inc. To copy otherwise, to republish, to post on

servers, or to redistribute to lists, requires prior specific

permission and/or a fee.

191

The most important difference between previous boundary ﬁnd-

ing techniques and Intelligent Scissors presented here lies not in the

boundary deﬁning criteria per se´, but in the method of interaction.

Namely, previous methods exhibit a pattern of boundary approxi-

mation followed by boundary reﬁnement, whereas Intelligent Scis-

sors allow the user to interactively select the most suitable

boundary from a set of all optimal boundaries emanating from a

seed point. In addition, previous approaches do not incorporate on-

the-ﬂy training or cooling, and are not as computationally efﬁcient.

Finally, it appears that the problem of automated matching of spa-

tial frequencies for digital image composition has not been

addressed previously.

3. Intelligent Scissors

Boundary deﬁnition via dynamic programming can be formu-

lated as a graph searching problem [10] where the goal is to ﬁnd the

optimal path between a start node and a set of goal nodes. As

applied to image boundary ﬁnding, the graph search consists of

ﬁnding the globally optimal path from a start pixel to a goal pixel--

in particular, pixels represent nodes and edges are created between

each pixel and its 8 neighbors. For this paper, optimality is deﬁned

as the minimum cumulative cost path from a start pixel to a goal

pixel where the cumulative cost of a path is the sum of the local

edge (or link) costs on the path.

3.1. Local Costs

Since a minimum cost path should correspond to an image com-

ponent boundary, pixels (or more accurately, links between neigh-

boring pixels) that exhibit strong edge features should have low

local costs and vice-versa. Thus, local component costs are created

from the various edge features:

The local costs are computed as a weighted sum of these component

functionals. Letting l(p,q) represents the local cost on the directed

link from pixel p to a neighboring pixel q, the local cost function is

(1)

where each ω is the weight of the corresponding feature function.

(Empirically, weights of ω

= 0.43, ω

= 0.43, and ω

= 0.14 seem

to work well in a wide range of images.)

The laplacian zero-crossing is a binary edge feature used for edge

localization [7, 9]. Convolution of an image with a laplacian kernel

approximates the 2

partial derivative of the image. The laplacian

image zero-crossing corresponds to points of maximal (or minimal)

gradient magnitude. Thus, laplacian zero-crossings represent

“good” edge properties and should therefore have a low local cost.

If I

(q) is the laplacian of an image I at pixel q, then

(2)

However, application of a discrete laplacian kernel to a digital

image produces very few zero-valued pixels. Rather, a zero-cross-

ing is represented by two neighboring pixels that change from pos-

itive to negative. Of the two pixels, the one closest to zero is used

to represent the zero-crossing. The resulting feature cost contains

single-pixel wide cost “canyons” used for boundary localization.

Image Feature Formulation

Laplacian Zero-Crossing f

Gradient Magnitude f

Gradient Direction f

l pq,()ω

q()⋅ω

pq,()⋅ω

q()⋅++=

q()

0;if I

q() 0=

1;if I

q() 0≠

Since the laplacian zero-crossing creates a binary feature, f

(q)

does not distinguish between strong, high gradient edges and weak,

low gradient edges. However, gradient magnitude provides a direct

correlation between edge strength and local cost. If I

and I

repre-

sent the partials of an image I in x and y respectively, then the gra-

dient magnitude G is approximated with

The gradient is scaled and inverted so high gradients produce low

costs and vice-versa. Thus, the gradient component function is

(3)

giving an inverse linear ramp function. Finally, gradient magnitude

costs are scaled by Euclidean distance. To keep the resulting max-

imum gradient at unity, f

(q) is scaled by 1 if q is a diagonal neigh-

bor to p and by 1/√2 if q is a horizontal or vertical neighbor.

The gradient direction adds a smoothness constraint to the

boundary by associating a high cost for sharp changes in boundary

direction. The gradient direction is the unit vector deﬁned byI

and

. Letting D(p) be the unit vector perpendicular (rotated 90 degrees

clockwise) to the gradient direction at pointp (i.e., for D(p) = (I

(p),

-I

(p))), the formulation of the gradient direction feature cost is

(4)

where

are vector dot products and

(5)

is the bidirectional link or edge vector between pixels p and q.

Links are either horizontal, vertical, or diagonal (relative to the

position of q in p’s neighborhood) and point such that the dot prod-

uct of D(p) and L(p, q) is positive, as noted in (5). The neighbor-

hood link direction associates a high cost to an edge or link between

two pixels that have similar gradient directions but are perpendicu-

lar, or near perpendicular, to the link between them. Therefore, the

direction feature cost is low when the gradient direction of the two

pixels are similar to each other and the link between them.

3.2. Two-Dimensional Dynamic Programming

As mentioned, dynamic programming can be formulated as a

directed graph search for an optimal path. This paper utilizes an

optimal graph search similar to that presented by Dijkstra [6] and

extended by Nilsson [13]; further, this technique builds on and

extends previous boundary tracking methods in 4 important ways:

1. It imposes no directional sampling or searching constraints.

2. It utilizes a new set of edge features and costs: laplacian

zero-crossing, multiple gradient kernels.

3. The active list is sorted with an O(N) sort for N nodes/pixels.

4. No a priori goal nodes/pixels are speciﬁed.

First, formulation of boundary ﬁnding as a 2-D graph search elimi-

nates the directed sampling and searching restrictions of previous

implementations, thereby allowing boundaries of arbitrary com-

max G() G−

max G()

max G

()

−==

pq,()

pq,()[]cos

1−

pq,()[]cos

1−

+{}=

pq,()D'p() Lpq,()⋅=

pq,()Lpq,()D'q()⋅=

Lpq,()

qpif D ' p() qp−()0≥⋅;−

pqif D ' p() qp−()0<⋅;−

192

plexity to be extracted. Second, the edge features used here are

more robust and comprehensive than previous implementations: we

maximize over different gradient kernels sizes to encompass the

various edge types and scales while simultaneously attempting to

balance edge detail with noise suppression [7], and we use the lapla-

cian zero-crossing for boundary localization and ﬁne detail live-

wire “snapping”. Third, the discrete, bounded nature of the local

edge costs permit the use of a specialized sorting algorithm that

inserts points into a sorted list (called the active list) in constant

time. Fourth, the live-wire tool is free to deﬁne a goal pixel inter-

actively, at any “free” point in the image, after minimum cost paths

are computed to all pixels. The latter happens fast enough that the

free point almost always falls within an expanding cost wavefront

and interactivity is not impeded.

The Live-Wire 2-D dynamic programming (DP) graph search

algorithm is as follows:

Algorithm: Live-Wire 2-D DP graph search.

Input:

s {Start (or seed) pixel.}

l(q,r) {Local cost function for link between pixels q and r.}

Data Structures:

L {List of active pixels sorted by total cost (initially empty).}

N(q) {Neighborhood set of q (contains 8 neighbors of pixel).}

e(q) {Boolean function indicating if q has been expanded/processed.}

g(q) {Total cost function from seed point to q.}

Output:

p {Pointers from each pixel indicating the minimum cost path.}

Algorithm:

g(s)←0; L←s; {Initialize active list with zero cost seed pixel.}

while

L≠∅

do begin

{While still points to expand:}

q←min(L); {Remove minimum cost pixel q from active list.}

e(q)←TRUE; {Mark q as expanded (i.e., processed).}

for each

r∈N(q)

such that

not e(r)

do begin

tmp

←g(q)+l(q,r); {Compute total cost to neighbor.}

r∈L

and

tmp

<g(r)

then

{Remove higher cost neighbor’s }

r←L; { from list.}

r∉L

then begin

{If neighbor not on list, }

g(r)←g

tmp

; { assign neighbor’s total cost, }

p(r)←q; { set (or reset) back pointer, }

L←r; { and place on (or return to) }

end

{ active list.}

end

Notice that since the active list is sorted, when a new, lower cumu-

lative cost is computed for a pixel already on the list then that point

must be removed from the list in order to be added back to the list

with the new lower cost. Similar to adding a point to the sorted list,

this operation is also performed in constant time.

Figure 1 demonstrates the use of the 2-D DP graph search algo-

rithm to create a minimum cumulative cost path map (with corre-

sponding optimal path pointers). Figure 1(a) is the initial local cost

map with the seed point circled. For simplicity of demonstration

the local costs in this example are pixel based rather than link based

and can be thought of as representing the gradient magnitude cost

feature. Figure 1(b) shows a portion of the cumulative cost and

pointer map after expanding the seed point (with a cumulative cost

of zero). Notice how the diagonal local costs have been scaled by

Euclidean distance (consistent with the gradient magnitude cost

feature described previously). Though complicating the example,

weighing by Euclidean distance is necessary to demonstrate that the

cumulative costs to points currently on the active list can change if

even lower cumulative costs are computed from as yet unexpanded

neighbors. This is demonstrated in Figure 1(c) where two points

have now been expanded--the seed point and the next lowest cumu-

lative cost point on the active list. Notice how the points diagonal

to the seed point have changed cumulative cost and direction point-

ers. The Euclidean weighting between the seed and diagonal points

makes them more costly than non-diagonal paths. Figures 1(d),

1(e), and 1(f) show the cumulative cost/direction pointer map at

various stages of completion. Note how the algorithm produces a

“wavefront” of active points emanating from the initial start point,

called the seed point, and that the wavefront grows out faster where

there are lower costs.

3.3. Interactive “Live-Wire” Segmentation Tool

Once the optimal path pointers are generated, a desired boundary

segment can be chosen dynamically via a “free” point. Interactive

movement of the free point by the mouse cursor causes the bound-

ary to behave like a live-wire as it adapts to the new minimum cost

path by following the optimal path pointers from the free point back

45 41 35 31 29 35 33 34 36 40 50

38 29 23 22 24 29 37 38 42 39 43

28 18 16 21 28 37 46 49 47 40 35

18 12 16 27 38 53 59 53 39 33 31

14 8 132029354954352832

14 6 6 12 14 22 28 35 27 25 31

18729591421182332

164016121315192739

18 13 7 6 14 17 18 17 24 30 45

111312958312410

1411742584638

116357912111074

7 4 6 11 13 18 17 14 8 5 2

6 2 7 10 15 15 21 19 8 3 5

83479131415956

115283457259

1242156324812

1097598537815

(a)

(e)

(f)

Figure 1: (a) Initial local cost matrix. (b) Seed point (shaded)

expanded. (c) 2 points (shaded) expanded. (d) 5 points (shaded)

expanded. (e) 47 points expanded. (f) Finished total cost and path

matrix with two of many paths (free points shaded) indicated.

(c)

(d)

41 35 31 29 35

38 29 23 22 24 29

28 18 16 21 28 37

18 12 16 27 38

14 8 1320293552 352832

14 6 6 12 14 22 28 35 27 25 31

18729591421182332

164016121315192740

18 13 7 6 14 17 18 17 24 30

(b)

7295

4016

13 7 6 14

7211

401

13 7 7

6 6 12 14 23

2072959

16401613

18 13 7 6 14

193

to the seed point. By constraining the seed point and free points to

lie near a given edge, the user is able to interactively “snap” and

“wrap” the live-wire boundary around the object of interest. Figure

2 demonstrates how a live-wire boundary segment adapts to

changes in the free point (cursor position) by latching onto more

and more of an object boundary. Speciﬁcally, note the live-wire

segments corresponding to user-speciﬁed free point positions at

times t

, t

, and t

. Although Fig. 2 only shows live-wire segments

for three discrete time instances, live-wire segments are actually

updated dynamically and interactively (on-the-ﬂy) with each move-

ment of the free point.

When movement of the free point causes the boundary to digress

from the desired object edge, interactive input of a new seed point

prior to the point of departure reinitiates the 2-D DP boundary

detection. This causes potential paths to be recomputed from the

new seed point while effectively “tieing off” the boundary com-

puted up to the new seed point.

Note again that optimal paths are computed from the seed point

to all points in the image (since the 2-D DP graph search produces

a minimum cost spanning tree of the image [6]). Thus, by selecting

a free point with the mouse cursor, the interactive live-wire tool is

simply selecting an optimal boundary segment from a large collec-

tion of optimal paths.

Since each pixel (or free point) deﬁnes only one optimal path to

a seed point, a minimum of two seed points must be placed to

ensure a closed object boundary. The path map from the ﬁrst seed

point of every object is maintained during the course of an object’s

boundary deﬁnition to provide a closing boundary path from the

free point. The closing boundary segment from the free point to the

ﬁrst seed point expedites boundary closure.

Placing seed points directly on an object’s edge is often difﬁcult

and tedious. If a seed point is not localized to an object edge then

spikes results on the segmented boundary at those seed points (since

Figure 2: Image demonstrating how the live-wire segment adapts and

snaps to an object boundary as the free point moves (via cursor move-

ment). The path of the free point is shown in white. Live-wire segments

from previous free point positions (t

, t

, and t

) are shown in green.

(a) (b)

Figure 3: Comparison of live-wire without (a) and with (b) cooling.

Withot cooling (a), all seed points must be placed manually on the

object edge. With cooling (b), seed points are generated automatically

as the live-wire segment freezes.

the boundary is forced to pass through the seed points). To facilitate

seed point placement, a cursor snap is available which forces the

mouse pointer to the maximum gradient magnitude pixel within a

user speciﬁed neighborhood. The neighborhood can be anywhere

from 1×1 (resulting in no cursor snap) to 15×15 (where the cursor

can snap as much as 7 pixels in both x and y). Thus, as the mouse

cursor is moved by the user, it snaps or jumps to a neighborhood

pixel representing a “good” static edge point.

3.4. Path Cooling

Generating closed boundaries around objects of interest can

require as few as two seed points (for reasons given previously).

Simple objects typically require two to ﬁve seed points but complex

objects may require many more. Even with cursor snap, manual

placement of seed points can be tedious and often requires a large

portion of the overall boundary deﬁnition time.

(a)

(b)

Figure 4: Comparison of live-wire (a) without and (b) with dynamic

training. (a) Without training, the live-wire segment snaps to nearby

strong edges. (b) With training, it favors edges with similar characteris-

tics as those just learned. (c) The static gradient magnitude cost map

shows that without training, high gradients are favored since they map

to low costs. However, with training, the dynamic cost map (d) favors

gradients similar to those sampled from the previous boundary segment.

Cost

Gradient Magnitude

Cost

Gradient Magnitude

Static Cost Map Dynamic Cost Map

194

Automatic seed point generation relieves the user from precise

manual placement of seed points by automatically selecting a pixel

on the current active boundary segment to be a new seed point.

Selection is based on “path cooling” which in turn relies on path

coalescence. Though a single minimum cost path exists from each

pixel to a given seed point, many paths “coalesce” and share por-

tions of their optimal path with paths from other pixels. Due to

Bellman’s Principle of Optimality [3], if any two optimal paths

from two distinct pixels share a common point or pixel, then the two

paths are identical from that pixel back to the seed point. This is par-

ticularly noticeable if the seed point is placed near an object edge

and the free point is moved away from the seed point but remains

in the vicinity of the object edge. Though a new optimal path is

selected and displayed every time the mouse cursor moves, the

paths are typically identical near the seed point and object edges

and only change local to the free point. As the free point moves far-

ther and farther away from the seed point, the portion of the active

live-wire boundary segment that does not change becomes longer.

New seed points are generated at the end of a stable segment (i.e.,

that has not changed recently). Stability is measured by time (in

milliseconds) on the active boundary and path coalescence (number

of times the path has been redrawn from distinct free points).

This measure of stability provides the live-wire segment with a

sense of “cooling”. The longer a pixel is on a stable section of the

live-wire boundary, the cooler it becomes until it eventually freezes

and automatically produces a new seed point.

Figure 3 illustrates the beneﬁt of path cooling. In Fig. 3(a), the

user must place each seed point manually on the object boundary.

However, with cooling (Fig. 3(b)), only the ﬁrst seed point (and last

free point) need to be speciﬁed manually; the other seed points were

generated automatically via cooling.

3.5. Interactive Dynamic Training

On occasion, a section of the desired object boundary may have

a weak gradient magnitude relative to a nearby strong gradient

edge. Since the nearby strong edge has a relatively lower cost, the

live-wire segment snaps to the strong edge rather than the desired

weaker edge. This can be seen in Fig. 4(a). The desired boundary

is the woman’s (Harriet’s) cheek. However, since part of it is so

close to the high contrast shoulder of the man (Ozzie), the live-wire

snaps to the shoulder.

Training allows dynamic adaptation of the cost function based on

a sample boundary segment. Training exploits an object’s bound-

ary segment that is already considered to be good and is performed

dynamically as part of the boundary segmentation process. As a

result, trained features are updated interactively as an object bound-

ary is being deﬁned. On-the-ﬂy training eliminates the need for a

separate training phase and allows the trained feature cost functions

to adapt within the object being segmented as well as between

objects in the image. Fig. 4(b) demonstrates how a trained live-wire

segment latches onto the edge that is similar to the previous training

segment rather that the nearby stronger edge.

To facilitate training and trained cost computation, a gradient

magnitude feature map or image is precomputed by scaling the min-

imized gradient magnitude image, G', into an integer range of size

(i.e., from 0 to n

- 1). The actual feature cost is determined by

mapping these feature values through a look-up table which con-

tains the scaled (weighted) cost for each value. Fig 4(c) illustrates

edge cost based on gradient magnitude without training. Note that

with training (Fig. 4(d)) edge cost plummets for gradients that are

speciﬁc to the object of interest’s edges.

Selection of a “good” boundary segment for training is made

interactively using the live-wire tool. To allow training to adapt to

slow (or smooth) changes in edge characteristics, the trained gradi-

ent magnitude cost function is based only on the most recent or

closest portion of the current deﬁned object boundary. A training

length, t, speciﬁes how many of the most recent boundary pixels are

used to generate the training statistics. A monotonically decreasing

weight function (either linearly or Gaussian based) determines the

contribution from each of the closest t pixels. This permits adaptive

training with local dependence to prevent trained feature from

being too subject to old edge characteristics. The closest pixel (i.e.,

the current active boundary segment endpoint) gets a weight of 1

and the point that is t pixels away, along the boundary from the cur-

rent active endpoint, gets a minimal weight (which can be deter-

mined by the user). The training algorithm samples the

precomputed feature maps along the closestt pixels of the edge seg-

ment and increments the feature histogram element by the corre-

sponding pixel weight to generate a histogram for each feature

involved in training.

After sampling and smoothing, each feature histogram is then

scaled and inverted (by subtracting the scaled histogram values

from its maximum value) to create the feature cost map needed to

convert feature values to trained cost functions.

Since training is based on learned edge characteristics from the

most recent portion of an object’s boundary, training is most effec-

tive for those objects with edge properties that are relatively consis-

tent along the object boundary (or, if changing, at least change

smoothly enough for the training algorithm to adapt). In fact, train-

ing can be counter-productive for objects with sudden and/or dra-

matic changes in edge features. However, training can be turned on

and off interactively throughout the deﬁnition of an object bound-

ary so that it can be used (if needed) in a section of the boundary

with similar edge characteristics and then turned off before a drastic

change occurs.

3.6 Comparison with Snakes

Due to the recent popularity of snakes and other active contours

models and since the interactive boundary wrapping of the live-

wire may seem similar to the “wiggling” of snakes, we highlight

what we feel are the similarities and their corresponding differences

between snakes and Intelligent Scissors.

Similarities (compare with corresponding differences below):

1. The gradient magnitude cost in Intelligent Scissors is similar to

the edge energy functional used in snakes.

2. Both methods employ a smoothing term to minimize the effects

of noise in the boundary.

3. Snakes and live-wire boundaries are both attracted towards

strong edge features.

4. Both techniques attempt to ﬁnd globally optimal boundaries to

try to overcome the effects of noise and edge dropout.

5. Snakes and Intelligent Scissors both require interaction as part of

the boundary segmentation process.

Differences (compare with corresponding similarities above):

1. The laplacian zero-crossing binary cost feature seems to have not

been used previously in active contours models

(or DP bound-

ary tracking methods for that matter).

2. The active contour smoothing term is internal (i.e., based on the

contour’s point positions) whereas the smoothing term for live-

wire boundaries is computed from external image gradient direc-

tions

2(next page)

1. Kass et al. [8] did use a squared laplacian energy functional to show the rela-

tionship of scale-space continuation to the Marr-Hildreth edge detection theory. How-

ever, the squared laplacian does not represent a binary condition, nor could it since the

variational calculus minimization used in [8] required that all functionals be differen-

tiable.

195

Intelligent scissors for image composition

Figures

Citations

"GrabCut": interactive foreground extraction using iterated graph cuts

Computer Vision: Algorithms and Applications

Modeling and rendering architecture from photographs: a hybrid geometry- and image-based approach

Graphcut textures: image and video synthesis using graph cuts

Lazy snapping

References

A note on two problems in connexion with graphs

Snakes : Active Contour Models

Dynamic Programming

Theory of Edge Detection

Computer vision

Related Papers (5)

"GrabCut": interactive foreground extraction using iterated graph cuts

Snakes : Active Contour Models

Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images

An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision

A note on two problems in connexion with graphs

Frequently Asked Questions (12)

Q1. What have the authors contributed in "Intelligent scissors for image composition" ?

Q2. What is the effect of cooling on the live-wire boundary?

Q3. How can training be used to train objects?

Q4. How many seed points are required to ensure a closed object boundary?

Q5. What is the effect of the formulation of boundary finding as a 2-D graph search?

Q6. What is the purpose of equivalencing of spatial frequencies?

Q7. What are the important extensions of the work?

Q8. What is the difference between training and a live-wire?

Q9. What is the way to create a composition mask?

Q10. What is the difference between the livewire tool and the previous optimal contours?

Q11. How can The authorextend the gradient direction term without significant loss of computational efficiency?

Q12. How can the object edge be estimated to subpixel accuracy?