What are the future works in "Flattening curved documents in images" ?

Subsequent experiments are being conducted to test the system ’ s performance on large scale data, and the results will be reported in a future publication.

What is the first part of their work on camera-captured document processing?

The restoration of a frontal planar view of a warped document from a single picture is the first part of their planed work on camera-captured document processing.

What is the measure of continuity in the unfolded plane?

In the unfolded plane, the authors expect the text lines to be continuous, parallel, straight, and orthogonal to the vertical character stroke direction.

Why are synthetic images better than real images?

The reason that results obtained from synthetic images are better than from real images is because synthetic images have better quality in terms of focus, noise and contrast.

What is the common convergence point of the major texture flow field EM?

The common convergence point of the major texture flow field EM is the horizontal vanishing point of the plane, while Em gives the vertical vanishing point.

What is the way to measure the smoothness of the planar strips?

The group of strips {Pi}n+1i=1 is globally smooth, that is, the change between the normals of two adjacent planar strips is not abrupt.

What is the score of the candidate l(, x, y)?

The score of the candidate l(θ, x, y) is s(θ) = ∑ni=2 d(Ei−1, Ei) n(1)where d(Ei−1, Ei) measures the difference between two sub-fields, which in their implementation is the sum of squared differences2.

Why do the authors need to use small blocks of texture flow field to detect rulings?

due to the errors in estimated EM and Em, the authors have found that texture flow at individual pixels has too much noise for this direct method to work well.

What is the procedure for calculating the major texture flow directions?

the authors remove the major texture flow directions from the local skew candidates, reset confidences for the remaining candidates, and apply the relaxation again.

(Open Access) Flattening curved documents in images (2005) | Jian Liang

Q: What have the authors contributed in "Flattening curved documents in images" ?

In this paper the authors describe a novel approach for flattening a curved document in a single picture captured by an uncalibrated camera. To their knowledge this is the first reported method able to process general curved documents in images without camera calibration. The authors propose to model the page surface by a developable surface, and exploit the properties ( parallelism and equal line spacing ) of the printed textual content on the page to recover the surface shape.

Q: What is the method used to extract texture flow from textual content?

The authors use an edge detector to find pixels with strong gradient, and apply an open operator to expand those pixels into textual area.

Q: What is the way to describe a developable surface?

It is well known in elementary differential geometry that given sufficient differentiability a developable surface is either a plane, a cylinder, a cone, the collection of the tangents of a curve in space, or a composition of these types.

Flattening Curved Documents in Images

Jian Liang, Daniel DeMenthon, David Doermann

Language And Media Processing Laboratory

University of Maryland

College Park, MD, 20770

{lj,daniel,doermann}@cfar.umd.edu

Abstract

Compared to scanned images, document pictures

captured by camera can suﬀer from distortions due to

perspective and page warping. It is necessary to re-

store a frontal planar view of the page before other

OCR techniques can be applied. In this paper we de-

scribe a novel approach for ﬂattening a curved docu-

ment in a single picture captured by an uncalibrated

camera. To our knowledge this is the ﬁrst reported

method able to process general curved documents in im-

ages without camera calibration. We propose to model

the page surface by a developable surface, and exploit

the properties (parallelism and equal line spacing) of

the printed textual content on the page to recover the

surface shape. Experiments show that the output im-

ages are much more OCR friendly than the original

ones. While our method is designed to work with any

general developable surfaces, it can be adapted for typ-

ical special cases including planar pages, scans of thick

books, and opened books.

1. Introduction

Digital cameras have proliferated rapidly in recent

years due to their small size, ease of use, fast response,

rich set of features, and dropping price. For the OCR

community, they present an attractive alternative to

scanners as imaging device s for capturing documents

because of their ﬂexibility. However, compared to digi-

tal scans, camera captured document images often suf-

fer from many degradations both from intrinsic limits

of the devices and because of the unconstrained exter-

nal environment. Among many new challenges, one of

the most important is the distortion due to perspec-

tive and curved pages. Current OCR te chniques are

designed to work with scans of ﬂat 2D documents, and

cannot handle distortions involving 3D factors.

One way of dealing with these 3D factors is to use

special equipments such as structured light to measure

the 3D range data of the document, and recover the 2D

plane of the page [1, 12]. The requirement for costly

equipment, however, makes these approaches unattrac-

tive.

The problem of recovering planar surface orienta-

tions from images has been addressed by m any re-

searchers inside the general framework of shape es tima-

tion [5, 7, 10], and applied to the removal of perspective

in images of ﬂat documents [3, 4, 11]. However, page

warping adds a non-linear, non-parametric process on

top of this, making it much more diﬃcult to recover the

3D shape. As a way out, people add in more domain

knowledge and constraints. For example, when scan-

ning thick bo oks, the portion near the book spine forms

a cylinder shape [8], and results in curved text lines in

the image. Zhang and Tan [16] estimate the cylinder

shape from the varying shade in the image, assuming

that ﬂatbed scanners have a ﬁxed light projection di-

rection. In terms of camera captured document images,

Cao et al. [2] use a parametrical approach to estimate

the cylinder shape of an opened book. Their method

relies on text lines formed by bottom-up clustering of

connected components. Apart from the cylinder shape

assumption, they also have a restriction on the pose

that requires the image plane to be parallel to the gen-

eratrix of the page cylinder. Gumerov et al. [6] present

a method for shape estimation from single views of de-

velopable surfaces. They do not require cylinder shapes

and special poses. However, they require correspon-

dences between closed contours in the image and in

the unrolled page. They propose to use the rectilin-

ear page boundaries or margins in document images as

contours. This may not be applicable when part of the

page is occluded.

Another way out is to bypass the shape estimation

step, and come up with an approximate ﬂat view of the

page, with what we call shape-free methods. For scans

of thick bound volumes, Zhang and Tan [15] have an-

other method for straightening curved text lines. They

ﬁnd text line curves by clustering connected compo-

nents, and move the components to restore straight

horizontal base lines. The shape is still unknown but

image can be OCRed. Under the same cylinder shape

and parallel view assumptions as Cao et. al have, Tsoi

et al. [14] ﬂatten images of opened b ooks by a bilin-

ear morphing operation which maps the curved page

boundaries to a rectangle. Their method is also shape-

free. Although shape-free methods are simpler, they

can only deal with small distortions and can not be

applied when shape and pose are arbitrary.

Our goal is to restore a frontal planar image of a

warped document page from a single picture captured

by an uncalibrated digital camera. Our method is

based on two key observations: 1) a curved document

page can be modeled by a developable surface, and

2) printed textual content on the page forms texture

ﬂow ﬁelds that provide strong constraints on the un-

derlying surface shape [9]. More speciﬁcally, we extract

two texture ﬂow ﬁelds from the textual area in the pro-

jected image, which represent the local orientations of

projected text lines and vertical character strokes re-

spectively. The intrinsic parallelism of the texture ﬂow

vectors on the curved page is used to detect the pro-

jected rulings, and the equal text line spacing property

on the page is used to compute the vanishing p oints of

the surface rulings. Then a developable surface is ﬁtted

to the rulings and texture ﬂow ﬁelds, and the surface

is unrolled to generate the ﬂat page image.

Printed textual content provides the most promi-

nent and stable visual features in document images

[3, 11, 2, 15]. In real applications, other visual cues

are not as reliable. For example, shade may be biased

by multiple light sources; contours and edges may be

occluded. In term of the way of using printed textual

content in images, our work diﬀers from [15, 2] in that

we do not rely on connected component analysis which

may have diﬃculty with ﬁgures or tables. The mixture

of text and non-text elements will also make traditional

shape-from-texture techniques diﬃcult to apply, while

our texture ﬂow based method can still work. Over-

all, compared to others’ work, our method does not

require a ﬂat page, does not require 3D range data,

does not require camera calibration, does not require

special shapes or poses, and can be applied to arbitrary

developable document pages.

The remainder of this paper is organized into ﬁve

sections. Section 2 introduces developable surfaces and

describes the texture ﬂow ﬁelds generated by printed

text on document pages. Section 3 focuses on texture

ﬂow ﬁeld extraction. We describe the details of surface

estimation in Section 4, and discuss the experimental

Figure 1. Strip approximation of a developable surface.

results in Section 5. Section 6 concludes the paper.

2. Problem Modeling

The shape of a smoothly rolled document page can

be modeled by a developable surface. A developable

surface can be mapped isometrically onto a Euclidean

plane, or in plain English, can be unrolled onto a plane

without tearing or stretching. This process is called de-

velopment. Development does not change the intrinsic

prop e rties of the surface such as curve length or angle

formed by curves.

Rulings play a very important role in deﬁning de-

velopable surfaces. Through any point on the surface

there is one and only one ruling, except for the degen-

erated case of a plane. Any two rulings do not inter-

sect except for conic vertices. All p oints along a ruling

share a common tangent plane. It is well known in

elementary diﬀere ntial geometry that given suﬃcient

diﬀerentiability a developable surface is either a plane,

a cylinder, a cone, the collection of the tangents of a

curve in space, or a composition of these types. On a

cylindrical surface, all rulings are parallel; on a conic

surface, all rulings intersect at the conic vertex; for the

tangent surface case, the rulings are the tangent lines

of the underlying space curve; only in the planar case

are rulings not uniquely deﬁned.

The fact that all points along a ruling of a devel-

opable surface share a common tangent plane to the

surface leads to the result that the surface is the enve-

lope of a one-parameter family of planes, which are its

tangent planes. Therefore a developable surface can be

piecewise approximated by planar strips that belong to

the family of tangent planes (Fig. 1). Although this is

only a ﬁrst order approximation, it is suﬃcie nt for our

application. The group of planar strips can be fully de-

scribed by a se t of reference points {P

} along a curve

on the surface, and the surface normals {N

} at these

points.

Supp ose that for every point on a developable sur-

face we select a tangent vector; we say that the tan-

gents are parallel with respect to the underlying surface

if when the surface is developed, all tangents are par-

allel in the 2D space. A developable surface covered

by a uniformly distributed non-isotropic texture can

result in the perception of a parallel tangent ﬁeld. On

document pages, the texture of printed textual content

forms two parallel tangent ﬁelds: the ﬁrst ﬁeld follows

the local text line orientation, and the second ﬁeld fol-

lows the vertical character stroke orientation. Since the

text line orientation is more prominent, we call the ﬁrst

ﬁeld the major tangent ﬁeld and the second the minor

tangent ﬁeld.

The two 3D tangent ﬁelds are projected to two 2D

ﬂow ﬁelds in camera captured images, which we call

the major and minor texture ﬂow ﬁelds, denoted as

and E

. The 3D rulings on the surface are also

projected to 2D lines on the image, which we call the

2D rulings or projected rulings.

The texture ﬂow ﬁelds and 2D rulings are not di-

rectly visible. Section 3 introduces our method of ex-

tracting texture ﬂow from textual regions of do c ument

images. The texture ﬂow is used in Section 4 to derive

projected rulings, ﬁnd vanishing points of rulings, and

estimate the page shape.

3. Texture Flow Computation

We are only interested in texture ﬂow produced by

printed textual content in the image, therefore we need

to ﬁrst detect the textual area and textual content.

Among various text detection schemes proposed in the

literature we adopt a simple one since this is not our

focus in this work. We use an edge detector to ﬁnd pix-

els with strong gradient, and apply an open operator to

expand those pixels into textual area. Although sim-

ple, this method works well for document images w ith

simple backgrounds. Then we use Niblack’s adaptive

thresholding [13] to get binary images of textual con-

tent (Fig. 2). The binarization does not have to be

perfect, since we only use it to compute the texture

ﬂow ﬁelds, not for OCR.

The local texture ﬂow direction can be viewed as a

local skew direction. We divide the image into small

blocks, and use projection proﬁle analysis to compute

the local skew at the center of each block. Instead of

computing one skew angle, we compute several promi-

nent skew angles as candidates. Initially their conﬁ-

dence values represent the energy of the corresponding

projection proﬁles. A relaxation process follows to ad-

just conﬁdences in such a way that the candidates that

agree with neighbors get higher conﬁdences. As a re-

sult, the local text line directions are found. The relax-

ation process is nec es sary because due to randomness

in small image blocks, the text line orientations may

(a)

(b) (c)

Figure 2. Text area detection and text binarization. (a) A

do cument image captured by a camera. (b) Detected text

area. (c) Binary text image.

not initially be the most prominent. We use interpola-

tion and extrapolation to ﬁll a dense texture ﬂow ﬁeld

that covers every pixel. Next, we remove the major

texture ﬂow directions from the local skew candidates,

reset conﬁdences for the remaining candidates, and ap-

ply the relaxation again. This time the results are the

local vertical character stroke orientations. We com-

pute a dense minor texture ﬂow ﬁeld E

in the same

way.

Fig. 3 shows the major and minor texture ﬂow ﬁelds

computed from a binarized text image. Notice that

is quite good in Fig. 3(c) even though two ﬁgures

are embedded in the text. Overall, E

is much more

accurate than E

4. Page Shape Estimation

4.1. Finding Projected Rulings

Consider a developable surface D, a ruling R on D,

the tangent plane T at R, and a parallel tangent ﬁeld

V deﬁned on D. For a group of points {P

} along

R, all the tangents {V(P

)} at these points lie on T ,

and are parallel. Suppose the camera projection maps

} to {p

}, and {V(P

)} to {v(p

)}. Then under

orthographic projection, {v(p

)} are parallel lines on

the image plane; under s pherical projection, {v(p

)}

all lie on great circles on the view sphere that intersect

at two common points; and under perspective projec-

tion, {v(p

)} are lines that share a common vanishing

point. Therefore, theoretically if we have E

or E

we can detect projected rulings by testing the texture

ﬂow orientations along a ruling candidate against the

(a)

(b) (c)

(d) (e)

Figure 3. Texture ﬂow detection. (a) The four local skew

candidates in a small block. After relaxation the two middle

candidates are eliminated. (b) (c) Visualization of major

texture ﬂow ﬁeld. (d) (e) Visualization of minor texture

ﬂow ﬁeld.

above principles.

However, due to the errors in estimated E

and E

we have found that te xture ﬂow at individual pixels has

too much noise for this direct metho d to work well.

Instead, we propose to use small blocks of texture ﬂow

ﬁeld to increase the robustness of ruling detection.

In a simpliﬁed case, consider the image of a cylindri-

cal surface covered with a parallel tangent ﬁeld under

orthographic projection. Suppose we take two small

patches of the same shape (this is possible for a cylin-

der surface) on the surface along a ruling. We can

show that the two tangent sub-ﬁelds in the two patches

project to two identical texture ﬂow sub-ﬁelds in the

image. This idea can be expanded to general devel-

opable surfaces and perspective projections, as locally

a developable surface can be approximated by cylin-

der surfaces, and the projection can be approximated

by orthographic projection. If the two patches are not

taken along the same ruling, however, the above prop-

erty will not hold. Therefore we have the following

pseudo-code for detection of a 2D ruling that passes

through a given point (x, y) (see Fig. 4):

1. For each ruling direction candidate θ ∈ [0, π) do

the following

(a) Fix the line l(θ, x, y) that passes through

(x, y) and has angle θ with respect to the x-

axis

(b) Slide the center of a window along l at equal

steps and collect the major texture ﬂow ﬁeld

inside the window as a sub-ﬁeld {E

}

i=1

where n is the number of such sub-ﬁelds

s(θ) =

i=2

d(E

i−1

, E

)

(1)

where d(E

i−1

, E

) measures the diﬀerence be-

tween two sub-ﬁelds, which in our implemen-

tation is the sum of squared diﬀerences

2. Output the θ that corresponds to the smallest s(θ )

as ruling direction

We have found that the result has weak sensitivity

to a large range of window sizes or moving steps.

To ﬁnd a group of projected rulings that cover the

whole text area, ﬁrst a group of reference points are

automatically selecte d, then for each point a projected

ruling is computed. Because any two rulings do not

intersect inside the 3D page, we have an additional

restriction that two nearby projected rulings must not

intersect inside the textual area.

As Fig. 4 shows, our ruling detection scheme works

better in high c urvature parts of the surface than in

ﬂat parts. One reason is that in ﬂat parts the rulings

are not uniquely deﬁned. On the other hand, note that

when the surface curvature is small, the shape recovery

is not sensitive to the ruling detection res ult, so the

reduced accuracy in ruling computation does not have

severe adverse eﬀects on the ﬁnal result.

4.2. Computing Vanishing Points of Rulings

We compute the vanishing points of rulings based

on the e qual text line spacing property in documents.

For printed text lines in a paragraph, the line spac-

ing is usually ﬁxed. When a 3D ruling intersects with

these text lines, the intersections are equidistant in 3D

space. Under perspective projection, if the 3D ruling is

not parallel to the image plane, these intersections will

project to non-equidistant points on the image, and

the changes of distances can reveal the vanishing point

position:

Let {P

}

∞

i=−∞

be a set of points along a line in 3D

space such that |P

i+1

| is constant. A perspective pro-

jection maps P

to p

on the image plane. Then by the

invariance of cros s ratio we have

(a) (b)

Figure 4. Projected ruling estimation. (a) Two projected

ruling candidates and three image patches along the ruling

candidates. (b) The estimated rulings. (c)(d)(e) Enlarged

image patches. Notice that (c) and (d) have similar texture

ﬂow (but dissimilar texture) and are both diﬀerent from

(e).

||p

||P

|i − j||k − l|

|i − k||j − l|

, ∀i, j, k, l. (2)

And as a result we have

i+1

||p

i+2

i+3

i+2

||p

i+1

i+3

, ∀i, (3)

and

i+1

||p

i+2

||p

i+1

, ∀i, (4)

where v is the vanishing point corresponding to p

∞

−∞

We will come back to Eq. 4 and Eq. 3 after we

describe how we ﬁnd {p

}. We use a modiﬁed pro-

jection proﬁle analysis to ﬁnd the intersections of a

projected ruling and text lines. Usually a projection

proﬁle is built by projecting pixels in a ﬁxed direc-

tion onto a base line, such that each bin of the proﬁle

I(x, y : ax + by = 0). We call this a linear pro-

jection proﬁle, which is suitable for straight text lines.

When text lines are curved, we project pixels along the

curve onto the base line (the projected ruling in our

context), such that each bin is

I(x, y : f(x, y) = 0)

where f deﬁnes the curve. We call the result a curve-

based projection proﬁle (CBPP). The peaks of a CBPP

corresponds to positions where text lines intersect the

base line (assuming text pixels have intensity 1). Fig. 5

(a)

(b)

(c)

Figure 5. Computing the vanishing p oint of a 2D ruling.

(a) A 2D ruling in the document image. (b) The curve

based projection proﬁle (CBPP) along the ruling in (a).

blo cks identiﬁed. In each text block, the line spacing be-

tween top lines in the image (to the left in the proﬁle graph)

is smaller than that between lower lines (although this is

not very visible to the eye). Such diﬀerence is due to

perspective foreshortening and is explored to recover the

vanishing point. In this particular case, the true vanish-

ing point is (−3083.70, 6225.06) and the estimated value is

(−3113, 5907) (both in pixel units).

shows how we identify the text line positions along a

ruling.

The sequence of text line positions is clustered into

K groups, {p

}

k=1

, such that each group {p

}

i=1

satis-

ﬁes Eq. 3 within an error threshold. The purpose of the

clustering is to separate text paragraphs, and remove

paragraphs that have less than three lines.

To ﬁnd the best vanishing point v that satisﬁes Eq. 4

for every group of {p

}

i=1

, ﬁrst we represent p

by its

1D coordinate a

along the ruling r (the origin can be

any point on r). We write a

= b

, where e

is the

error term and b

is the true but unknown position of

text line.

Under the assumption that e

follows a normal dis-

tribution, the best v should minimize the error function

Flattening curved documents in images

Figures

Citations

Triggering applications based on a captured text in a mixed media environment

System And Methods For Creation And Use Of A Mixed Media Environment

Visibly-perceptible hot spots in documents

Geometric Rectification of Camera-Captured Document Images

Dynamic presentation of targeted information in a mixed media reality recognition system

References

Evaluation of binarization methods for document images

Metric rectification for perspective images of planes

Global and local document degradation models

Image restoration of arbitrarily warped documents

Structure of Applicable Surfaces from Single Views

Related Papers (5)

Global and local document degradation models

A Computational Approach to Edge Detection

Progress in camera-based document image analysis

Text detection from natural scene images: towards a system for visually impaired persons

A flexible new technique for camera calibration

Frequently Asked Questions (16)

Q1. What have the authors contributed in "Flattening curved documents in images" ?

Q2. What are the future works in "Flattening curved documents in images" ?

Q3. What is the method used to extract texture flow from textual content?

Q4. What is the first part of their work on camera-captured document processing?

Q5. What is the measure of continuity in the unfolded plane?

Q6. Why have digital cameras proliferated in recent years?

Q7. Why are synthetic images better than real images?

Q8. What is the common convergence point of the major texture flow field EM?

Q9. What is the way to measure the smoothness of the planar strips?

Q10. What is the way to describe a developable surface?

Q11. What is the score of the candidate l(, x, y)?

Q12. How do the authors extract texture flow fields from the textual area in the projected image?

Q13. Why do the authors need to use small blocks of texture flow field to detect rulings?

Q14. What is the restriction on the pose of the page cylinder?

Q15. What is the procedure for calculating the major texture flow directions?

Q16. What is the way to recover the 2D plane of the page?