scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Multi-Resolution Probabilistic Information Fusion for Camera-based Document Image Matching

TL;DR: A novel multi-resolution robust methodology that is invariant to a large range of distortions, illumination changes, and is relatively resilient to noise and unmodelled objects present as clutter is proposed.
Abstract: Given a part of a document image taken with any camera at an arbitrary orientation and sometimes far-from-perfect illumination, an important problem is to match this query image to the corresponding full image from a document database. We propose a novel multi-resolution robust methodology for the same. The method combines information from independent sources of measurement in a probabilistic framework. The proposed method is invariant to a large range of distortions, illumination changes, and is relatively resilient to noise and unmodelled objects present as clutter. To the best of our knowledge, no related work address all these issues.

Summary (3 min read)

Introduction

  • Document image retrieval includes problems such as queries about layout [3] and logos [4].
  • The authors model distortions using projective invariants (cross ratios).
  • The number of feature points is often too large, and the votingbased procedure takes too much running time.
  • This simplifies the polynomial time complexity of a geometric hashing-based strategy to a linear one.

II. A ROBUST MULTI-RESOLUTION APPROACH WITH

  • The authors first examine the wide variety in query images that can be submitted to the system (Sec. II-A).
  • Sec. II-B considers fusion of probability estimates from multiple independent sources of measurement.
  • Sections II-C and II-D consider the two features used in this work namely, text/image blocks, and the extrema points of contour envelopes, and discuss issues related to handling these features at multiple levels of resolution.
  • Sec. II-E explains the preprocessing steps in order to extract these two features from a given document image.

A. Wide Variations in Query Images: Geometric Deformations, Illumination Variations, Noise

  • Database images are generally taken in good imaging conditions with good and uniform illumination and zero skew.
  • For a query image, a common situation is to have a part of a document image taken by a common camera (a cellphone camera, for instance), and at an arbitrary orientation, and possibly in a region of bad illumination.
  • In general, the geometric deformation could be non-linear.
  • The fundamental theorem of Plane Projective Geometry (extensively cited in [7]) relates any two planes in higher dimensional space using a 2-D projective transform.
  • F (x, y) denotes the image intensity, the ∇ denotes the intensity gradient, W (x, y) is a local window centred at pixel (x, y) and c is a small positive constant used to avoid division by zero.

B. Multiple Sources of Measurement

  • The proposed technique is independent of the specific sources of measurements for different features.
  • This could correspond to any block dikj .
  • Let Pfl(qj |dikj ) denote the probability of query image block qj corresponding to block dikj in document Di, obtained using feature fl.
  • Sections II-C and II-D describe the computation of the corresponding Pfl(qj |dikj ) for the two cases, respectively.

C. Script-Independent Matching of Text/Image Blocks

  • The first feature that the authors use are four corner points of the bounding quadrilateral of a text or an image block.
  • (Section II-E outlines the basic pre-processing steps in their system).
  • The advantage of taking ρ(x, σ) in place of x2 (or a normalised version of it, for that matter) is that the robust error norm is more robust to an outlier.
  • The system starts at the smallest resolution.

D. Geometric Hashing-based Matching of Contour Envelope Curvature Extrema Projective Co-ordinates

  • From the basic pre-processing steps of Sec. II-E, the second feature the authors use is the curvature extrema of the contour envelope.
  • For an image at any level in the Gaussian pyramid, smearing results in a text block.
  • The authors consider a hash table for both the database document block dikj ,as well as the query block qj .
  • The authors can reduce this to linear, if each has table row is sorted.
  • Hence, the problem of matching curvature extrema reduces to O(M5jN5j )× the row matching time.

E. Feature Extraction

  • Both features in Sections II-C and II-D have a common processing pipeline.
  • Images are stored at multiple levels of resolution.
  • The first step is the application of a run length smearing algorithm [9].
  • For the first feature, the authors use a Hough Transform-based method to fit a quadrilateral around a text/image block provided it is greater than a particular size (this is again a scaledependent parameter).
  • The authors do this only for blocks for which it is possible to fit four lines around it.

III. PROBABILISTIC HYPOTHESIS GENERATION

  • The authors are given a query image Q (which contains n blocks qj).
  • A query block qj could correspond to a database document block dikj in document Di. Based on the features in Sections II-C and II-D, the authors compute the probability that a particular query image block qj corresponds to database document block dikj as follows.
  • For a system with l features f1, f2 . . . fl, the authors compute this probability as P (dikj |qj) = ∏ l Pfl(dikj |qj) (6) This is reasonable, since they assume that the l features and their measurement processes are independent.
  • The authors note that while all query image blocks have to correspond to one document Di, one may have more than one hypothesis corresponding to a document Di. Given that they have observed query image blocks q1 . . . qn, they compute the probability that these n blocks correspond to blocks di1 . . . din corresponding to document image.

IV. EXPERIMENTAL RESULTS AND DISCUSSION

  • The authors have a set of 50 database document images, and 100 query images.
  • The database document images have maximum size at the highest resolution level of 2340×1700, and the corresponding maximum figure for query images is 2848× 1600.
  • 2) Highly skewed query image: Fig. 2 shows an example of successful matching in spite of a large amount of skew in the query image.
  • Some statistics for the above cases are as follows.
  • For the 20 images with occlusions and structured noise, there were 7 failures either because the object was at the corner of a block (resulting in a wrong bounding quadrilateral), or resulted in more contour curvature extrema from the occluding object than from the actual text block.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Multi-Resolution Probabilistic Information Fusion
for Camera-based Document Image Matching
Sumantra Dutta Roy
, Sitanshu Gupta
, Ishaan Gupta
, Kavita Bhardwaj
, Santanu Chaudhury
Dept of EE, IIT Delhi
Dept of ECE, MNNIT
Dept of ECE, NSIT
New Delhi - 110016, INDIA Allahabad, INDIA New Delhi - 110078, INDIA
Email: sumantra@ee.iitd.ac.in, sitanshu.08@gmail.com, ishaan09@gmail.com,
kavitabhardwaj.iitd@gmail.com, schaudhury@gmail.com
Abstract—Given a part of a document image taken with
any camera at an arbitrary orientation and sometimes far-
from-perfect illumination, an important problem is to match
this query image to the corresponding full image from a doc-
ument database. We propose a novel multi-resolution robust
methodology for the same. The method combines information
from independent sources of measurement in a probabilistic
framework. The proposed method is invariant to a large range
of distortions, illumination changes, and is relatively resilient to
noise and unmodelled objects present as clutter. To the best of
our knowledge, no related work address all these issues.
Keywords:
Multi-resolution Analysis, Projective Transformation, Ho-
mography, Robust Error Norm, Contour Envelope of Text
Block, Geometric Hashing, Probabilistic Information Fusion
I. INTRODUCTION
We present a novel multi-resolution probabilistic method for
matching a database document to a degraded query image (for
instance, taken from a low quality camera in bad illumination
and even with a part of the document occluded with other
objects.) The ubiquitous nature of mobile phones with cameras
makes camera-based document image analysis an important
research [1], [2]. An interesting development in public image
collections such as Flickr is the presence of images stored at
multiple resolutions. Our work explores the speed-resolution
trade-off to propose an efficient matching strategy. Our system
does not consider database organisation issues which affect
efficiency in image retrieval: we present a novel approach to
the image matching problem.
Document image retrieval includes problems such as queries
about layout [3] and logos [4]. These methods do not suffice
for the problem where a partial snapshot of text part of
a document is the query image. Nakai et al. [1] consider
the problem of camera-based document image retrieval. The
authors model distortions using projective invariants (cross
ratios). They consider features as centroids of word regions,
and use cross ratios to vote for documents in a Hough
transform-like manner, with a few problem-specific heuristics.
The number of feature points is often too large, and the voting-
based procedure takes too much running time. In [2] Liu
and Doermann propose an approach in which layout context
dictates local features. Since this relies on the layout of words,
it fails when there is a small amount of text present in captured
image. It also approximates the projective transform locally, so
it is not invariant for significant perspective distortions. The
Kise group extend their earlier ideas in [1] and experiment
with affine and projective models. In [5] Nakai et al. propose
the approach where combination of local invariants are hashed
into large hash-table. The hash table gives the process polyno-
mial time complexity. In Nakai and Kise [6], the authors use
their Locally Likely Arrangement Hashing (LLAH) with affine
invariants, using a neighbourhood assumption. This simplifies
the polynomial time complexity of a geometric hashing-based
strategy to a linear one. We propose an alternate strategy
based on a multi-resolution approach. We use more than one
feature, for robustness. One feature is similar to that of the
Kise group (contour envelope curvature extrema as opposed to
their word centroids) with geometric hashing: on an average,
we deal with far fewer points than them. Further, a multi-
resolution approach in which we often operate at a very
low scale - reduce our computational loads immensely. We
could apply a LLAH-like strategy to reduce this load even
further. Existing algorithms do not consider images with noisy
elements such as a pen, hand, or small objects, as shown in
Fig 4. Our proposed technique is entirely script- and language-
independent: it does not use any features of any specific script,
or language e.g., Fig. 3. To the best of our knowledge, no
relevant work addresses all these issues.
II. A ROBUST MULTI-RESOLUTION APPROACH WITH
MULTIPLE INDEPENDENT SOURCES OF MEASUREMENT
In this section, we first examine the wide variety in query
images that can be submitted to the system (Sec. II-A).
Sec. II-B considers fusion of probability estimates from mul-
tiple independent sources of measurement. Sections II-C and
II-D consider the two features used in this work namely,
text/image blocks, and the extrema points of contour en-
velopes, and discuss issues related to handling these features
at multiple levels of resolution. Sec. II-E explains the pre-
processing steps in order to extract these two features from a
given document image.

A. Wide Variations in Query Images: Geometric Deforma-
tions, Illumination Variations, Noise
Database images are generally taken in good imaging con-
ditions with good and uniform illumination and zero skew.
For a query image, a common situation is to have a part of
a document image taken by a common camera (a cellphone
camera, for instance), and at an arbitrary orientation, and
possibly in a region of bad illumination. Further, there could
be structured and/or unstructured noise in the image: imaging
noise, or other objects occluding parts of the document.
In general, the geometric deformation could be non-linear.
We use general linear model to approximate the deformation:
A 2-D projective transformation. The fundamental theorem of
Plane Projective Geometry (extensively cited in [7]) relates any
two planes in higher dimensional space using a 2-D projective
transform. Hence, the features used for matching have to
be either projective invariant, or estimating the homography
between two projective planes.
To encounter the effects of illumination variation in the
query image, we have a relative gradient image [8]:
I(x, y) =
|∇F (x, y)|
max
(u,v )W (x,y)
|∇F (u, v)| + c
(1)
where I is the relative image gradient. In this equation, F (x, y)
denotes the image intensity, the denotes the intensity
gradient, W (x, y) is a local window centred at pixel (x, y) and
c is a small positive constant used to avoid division by zero.
We replace all expressions involving pixel intensities with the
relative gradient, mentioned above.
B. Multiple Sources of Measurement
The proposed technique is independent of the specific
sources of measurements for different features. The database
of documents D contains m documents D
1
, D
2
. . . D
m
. Each
document D
i
has text/image blocks d
ik
j
. Consider a query
image Q with n text/image blocks q
1
, q
2
. . . q
n
. Consider
block q
j
in the query image. This could correspond to any
block d
ik
j
. Let P
f
l
(q
j
|d
ik
j
) denote the probability of query
image block q
j
corresponding to block d
ik
j
in document D
i
,
obtained using feature f
l
. Using the features f
l
(which come
from independent sources of measurement), we define the total
probability of the query image block q
j
being block d
ik
j
as
P (q
j
|d
ik
j
) =
Y
l
P
f
l
(q
j
|d
ik
j
) (2)
For our experiments, we use two features: the bounding
quadrilateral around the text/image block (Sec. II-C) and
the block contour envelope curvature extrema projective co-
ordinates (Sec. II-D). Sections II-C and II-D describe the
computation of the corresponding P
f
l
(q
j
|d
ik
j
) for the two
cases, respectively.
C. Script-Independent Matching of Text/Image Blocks
The first feature that we use are four corner points of
the bounding quadrilateral of a text or an image block.
(Section II-E outlines the basic pre-processing steps in our
system). A block q
j
in the query image Q could correspond
to a database block d
ik
j
of document D
i
. We model the
probability of the block in question being d
ik
j
given that we
have observed query image block q
j
, as follows:
P
f
l
(d
ik
j
|q
j
) = 1 (1/R)
X
r
ρ(x
r
, σ
1
) (3)
Here, ρ(x, σ) denotes the robust error norm [8], where σ is a
scale factor:
ρ(x, σ) =
x
2
x
2
+ σ
2
(4)
The above summation is for all pixels r in the warped query
block, with respect to the corresponding pixels in the database
document block d
ik
j
, and R is the total number of such
pixels. Let x denote the pixel intensity difference (based on the
relative gradient: Sec. II-A) between corresponding pixels of
the projected query block, and a database document block, for
a given pixel location. The advantage of taking ρ(x, σ) in place
of x
2
(or a normalised version of it, for that matter) is that the
robust error norm is more robust to an outlier. Corresponding
to an outlier pixel, if |x| > σ/3, its influence on the solution
will be less as ρ(·) approaches 1.
1) Selecting the Right Scale: The system starts at the
smallest resolution. Database images are stored at different
resolutions. The system uses the information from the smear-
ing algorithm (Sec. II-E) to obtain an estimate of the font size
(we assume that documents will have at least some text in
them.)
D. Geometric Hashing-based Matching of Contour Envelope
Curvature Extrema Projective Co-ordinates
From the basic pre-processing steps of Sec. II-E, the sec-
ond feature we use is the curvature extrema of the contour
envelope. For an image at any level in the Gaussian pyramid,
smearing results in a text block. For each such block, we use
the standard parametric representation for curvature:
κ =
|x
0
y
00
x
00
y
0
|
(x
02
+ y
02
)
3
2
(5)
Here, the y co-ordinate and the x coordinate of every pixel is
assumed to be a function of the index number of the point on
the contour and the derivatives (y
0
, x
0
, y
00
, x
00
) are accordingly
calculated using approximating difference equations.
Consider a text block q
j
from a query image Q containing
N
j
curvature extrema points. To consider the match with block
d
ik
j
with M
j
curvature extrema points, we note that a naive
strategy to match N
j
points with M
j
points would incur
exponential time complexity. To reduce this to polynomial
time, we use a Geometric Hashing-based strategy. We consider
a hash table for both the database document block d
ik
j
,as
well as the query block q
j
. We can select ordered set of 4
basis points from the database block in
M
j
4
× 4! ways -
this is O(M
4
). (We can reduce this by not considering all 4!
combinations, since realistic imagining conditions preclude all
but 4 of these [7]). For every quartet of basis points selected,
a Hash Table stores the projective coordinates of the rest of

the M
j
4 curvature extrema points. We perform the same
procedure for a query image block.
Block matching between a database document and query
block reduces to matching rows of two hash tables. Matching
two table rows has quadratic time complexity. We can reduce
this to linear, if each has table row is sorted. Hence, the prob-
lem of matching curvature extrema reduces to O(M
5
j
N
5
j
)×
the row matching time.
1) Selecting the Right Scale: Just as the resolution de-
termines the number of pixels in a block (Sec: II-C1), it
determines the number of contour extrema in a curve (contour)
represented at different resolutions/scales.
E. Feature Extraction
Both features in Sections II-C and II-D have a common
processing pipeline. Images are stored at multiple levels of
resolution. The first step is the application of a run length
smearing algorithm [9]. We learn these smearing parameters
(such as the horizontal and vertical run-lengths) for a large
number of documents at different scales, and store them in a
look-up table. For an image at a given scale, we use a simple
sequential labelling-based segmentation algorithm to find the
number of connected regions (blocks).
For the first feature, we use a Hough Transform-based
method to fit a quadrilateral around a text/image block pro-
vided it is greater than a particular size (this is again a scale-
dependent parameter). We do this only for blocks for which
it is possible to fit four lines around it. For the second feature
(curvature extrema points), we operate straight on the contour
of the extracted block.
III. PROBABILISTIC HYPOTHESIS GENERATION
Given a database of m documents, we initialise P (D
i
) =
1/m, where P (D
i
) is the a priori probability of document D
i
.
We are given a query image Q (which contains n blocks q
j
). A
query block q
j
could correspond to a database document block
d
ik
j
in document D
i
. Based on the features in Sections II-C
and II-D, we compute the probability that a particular query
image block q
j
corresponds to database document block d
ik
j
as follows. For a system with l features f
1
, f
2
. . . f
l
, we
compute this probability as
P (d
ik
j
|q
j
) =
Y
l
P
f
l
(d
ik
j
|q
j
) (6)
This is reasonable, since we assume that the l features and
their measurement processes are independent. In our case, we
have considered l = 2 for the number of features.
Given n blocks detected in the query image Q, the system
forms hypothesis corresponding to the correct identity of each
query block q
j
, 1 j n. We note that while all query image
blocks have to correspond to one document D
i
, one may have
more than one hypothesis corresponding to a document D
i
.
Given that we have observed query image blocks q
1
. . . q
n
,
we compute the probability that these n blocks correspond to
(a) (b) (c)
(d) (e) (f)
Fig. 1: The system works in cases of bad illumination con-
ditions: an illustration. (a) The original query image, (b) the
binarised image, (c) The quadrilateral block feature, (d) the
text countour and its curvature extrema, (e) the homography-
transformed query block, and (f) the error image: difference
between the database document block, and the projected query
image block
blocks d
i
1
. . . d
i
n
corresponding to document image D
i
:
P (d
i
1
. . . d
i
n
|q
1
, . . . q
n
) =
P (q
1
. . . q
n
|d
i
1
. . . d
i
n
)P (d
i
1
. . . d
i
n
)
P
t
P (q
1
. . . q
n
|d
t
1
. . . d
t
n
)P (d
t
1
. . . d
t
n
)
(7)
The summation in the denominator is for all hypothesis t
corresponding to the identity of all n blocks q
1
, q
2
. . . q
n
.
Further, the second term on the right side of the above equation
can be further simplified as follows
P (d
i
1
, d
i
2
. . . d
i
n
) = P (d
i
1
, d
i
2
. . . d
i
n
|D
i
)P (D
i
) (8)
Here, P (D
i
) is the a priori probability of document D
i
, and
the other terms may be approximated by the relative areas of
the document blocks in the corresponding database image, in
the corresponding same orientation. We compute the first term
of the numerator as follows:
P (q
1
. . . q
n
|d
i
1
. . . d
i
n
) =
Y
n
P (q
j
|d
ik
j
) (9)
We compute the final a posteriori probability of document
D
i
as the sum of probabilities of all individual hypothesis
corresponding to the particular document D
i
.
P (D
i
) =
X
t
P (d
t
1
, d
t
2
. . . d
t
n
|q
1
. . . q
n
) (10)
IV. EXPERIMENTAL RESULTS AND DISCUSSION
This paper represents work in progress. We have a set
of 50 database document images, and 100 query images.
The database document images have maximum size at the
highest resolution level of 2340×1700, and the corresponding
maximum figure for query images is 2848 × 1600.
1) Experiments with large illumination variations: Fig. 1
shows an example of successful matching in spite of bad
illumination conditions.

(a) (b) (c)
(d) (e) (f)
Fig. 2: Figure showing the performance of the proposed
technique on a highly skewed query image. The different parts
of the figure correspond to those in Fig. 1.
(a) (b) (c)
(d) (e) (f)
Fig. 3: Script and language independence: successful matching
of a Chinese language document. The different sub-parts are
same as in Fig. 1.
2) Highly skewed query image: Fig. 2 shows an example
of successful matching in spite of a large amount of skew in
the query image.
3) Script and language independence: An advantage of our
system is that it is independent of the specific language/script
used in a document image. Fig. 3 shows an example of
successful matching on a Chinese language document page.
4) Cases of occlusions, structured noise: Fig. 4 shows an
example of successful matching in spite of structured noise:
in this case, a common occurrence for images of documents
taken with a cellphone camera, or a hand-held camera.
5) Miscellaneous failure cases: Out of 70 query images
the system gave correct results (matched the corresponding
database document) in 65 cases. The 5 failure cases were due
to errors at the block building and feature detection stage itself.
Fig. 5 shows such a case.
Some statistics for the above cases are as follows. For 5 out
of 10 query images with insufficient illumination, it was not
possible to separate the blocks from the background. For the
20 images with occlusions and structured noise, there were 7
failures either because the object was at the corner of a block
(a) (b) (c)
(d) (e) (f)
Fig. 4: Successful matching in spite of structured noise: a
common case of some text hidden by a finger: a common
occurrence for hand-held document images with a cellphone
or a camera.
(a) (b)
Fig. 5: A failure case due to errors at the block building and
feature detection stage itself.
(resulting in a wrong bounding quadrilateral), or resulted in
more contour curvature extrema from the occluding object than
from the actual text block.
REFERENCES
[1] T. Nakai, K. Kise, and M. Iwamura, “Camera-Based Document Image
Retrieval as Voting for Partial Signatures of Projective Invariants, in
ICS, 2005.
[2] X. Liu and D. Doermann, “Mobile retriever-Finding the Document with
a snapshot, in Int. Workshop on Camera-Based Document Analysis and
Recognition, 2007, pp. 29–34.
[3] P. Hermann and G. Schlageter, “Retrieval of Document images using
layout knowledge, in Document Analysis and Recognition, 1993., Pro-
ceedings of the Second International Conference on, 1993, pp. 537
540.
[4] D. Doermann, E. Rivlin, and I. Weiss, Applying algebraic and differential
invariants for logo recognition, Mach. Vision Appl., vol. 9, pp. 73–86,
1996.
[5] T. Nakai, K. Kise, and M. Iwamura, “Hashing with Local Combinations
of Feature Points and its Application to Camera-based Document Image
Retrieval, Proc. CBDAR05, pp. 87–94, 2005.
[6] ——, “Use of Affine invariants in Locally likely Arrangement Hashing for
Camera-based Document Image Retrieval, Document Analysis Systems
VII, pp. 541–552, 2006.
[7] C. Rothwell, “Recognition using Projective Invariance, Ph.D. disserta-
tion, University of Oxford, 1993.
[8] S.-D. Wei and S.-H. Lai, “Robust and Efficient Image Alignment Based
on Relative Gradient Matching, IP, vol. 15, no. 10, pp. 2936–2943, 2006.
[9] H. Cao, R. Prasad, P. Natarajan, and E. MacRostie, “Robust Page
Segmentation Based on Smearing and Error Correction Unifying Top-
Down and Bottom-Up Approaches, in ICDAR, 2007.
References
More filters
Book ChapterDOI
13 Feb 2006
TL;DR: This paper introduces into LLAH an affine invariant instead of the perspective invariant so as to improve its adjustability and experimental results show that the use of the affines enables us to improve either the accuracy from 96.2% to 97.8%, or the retrieval time from 112 msec./query to 75 msec./ query by selecting parameters of processing.
Abstract: Camera-based document image retrieval is a task of searching document images from the database based on query images captured using digital cameras. For this task, it is required to solve the problem of “perspective distortion” of images,as well as to establish a way of matching document images efficiently. To solve these problems we have proposed a method called Locally Likely Arrangement Hashing (LLAH) which is characterized by both the use of a perspective invariant to cope with the distortion and the efficiency: LLAH only requires O(N) time where N is the number of feature points that describe the query image. In this paper, we introduce into LLAH an affine invariant instead of the perspective invariant so as to improve its adjustability. Experimental results show that the use of the affine invariant enables us to improve either the accuracy from 96.2% to 97.8%, or the retrieval time from 112 msec./query to 75 msec./query by selecting parameters of processing.

121 citations


"Multi-Resolution Probabilistic Info..." refers background in this paper

  • ...The Kise group extend their earlier ideas in [1] and experiment with affine and projective models....

    [...]

Journal ArticleDOI
01 Sep 1996
TL;DR: The problem of logo recognition is of great interest in the document domain, especially for document databases, and if the authors are given a logo block candidate and adocumentdatabase, they wish to determine whether there are any documents in the database of similar origin.
Abstract: The problem of logo recognition is of great interest in the document domain, especially for document databases. By recognizing the logo we obtain semantic information about the document which may be useful in deciding whether or not to analyze the textual components. Given a logo block candidate from a document image and alogo database, we would like to determine whether the region corresponds to a logo in the database. Similarly, if we are given a logo block candidate and adocumentdatabase, we wish to determine whether there are any documents in the database of similar origin. Both problems require indexing into a possibly large model space.

105 citations

Proceedings ArticleDOI
20 Oct 1993
TL;DR: A layout editor is proposed, which allows the interactive generation of a layout query using object oriented drawing functions, and can be done in a relational database, which was filled with the help of layout recognition methods.
Abstract: Document image archives are increasingly used to replace paper and microfilm filing. Usually those archives are combined with a database management system or with full text retrieval to search the documents. An additional retrieval method to search already known images in personal document image archives using layout knowledge is presented. This knowledge can be the size, position, and color of layout objects, but also the position of keywords. A layout editor is proposed, which allows the interactive generation of a layout query using object oriented drawing functions. The layout search can be done in a relational database, which was filled with the help of layout recognition methods. For better results and performance special search methods and structures are necessary. Examples for those methods are quad trees to locate layout objects at absolute positions, neighborhood tables to find object pairs with certain spatial relationships, and full text search in page or object areas. >

34 citations


"Multi-Resolution Probabilistic Info..." refers background in this paper

  • ...Our system does not consider database organisation issues which affect efficiency in image retrieval: we present a novel approach to the image matching problem....

    [...]

Journal ArticleDOI
TL;DR: Experimental results on both simulated and real images are shown to demonstrate superior efficiency and robustness of the proposed algorithm over the conventional normalized correlation method.
Abstract: In this paper, we present a robust image alignment algorithm based on matching of relative gradient maps. This algorithm consists of two stages; namely, a learning-based approximate pattern search and an iterative energy-minimization procedure for matching relative image gradient. The first stage finds some candidate poses of the pattern from the image through a fast nearest-neighbor search of the best match of the relative gradient features computed from training database of feature vectors, which are obtained from the synthesis of the geometrically transformed template image with the transformation parameters uniformly sampled from a given transformation parameter space. Subsequently, the candidate poses are further verified and refined by matching the relative gradient images through an iterative energy-minimization procedure. This approach based on the matching of relative gradients is robust against nonuniform illumination variations. Experimental results on both simulated and real images are shown to demonstrate superior efficiency and robustness of the proposed algorithm over the conventional normalized correlation method

27 citations


"Multi-Resolution Probabilistic Info..." refers background in this paper

  • ...To the best of our knowledge, no relevant work addresses all these issues....

    [...]

Proceedings ArticleDOI
31 Aug 2005
TL;DR: The proposed method takes as input a part or the whole of a document acquired as a query by a digital camera, and retrieves a document image that includes the query, and retrieval as voting for partial signatures of document images defined by the cross-ratios.
Abstract: We propose a method of document image retrieval using digital cameras The proposed method takes as input a part or the whole of a document acquired as a query by a digital camera, and retrieves a document image that includes the query For this purpose, it is required to solve the problem of "perspective distortion" of images, as well as to establish a way of matching parts of document images flexibly These are achieved based on the following characteristics of the proposed method: (1) indexing of document images using the projective invariants called the "cross-ratios", (2) retrieval as voting for partial signatures of document images defined by the cross-ratios From experimental results using digital cameras with high and low resolutions, we demonstrate the effectiveness of the proposed method

26 citations


"Multi-Resolution Probabilistic Info..." refers background or methods in this paper

  • ...Since this relies on the layout of words, it fails when there is a small amount of text present in captured image....

    [...]

  • ...…Probabilistic Information Fusion I. INTRODUCTION We present a novel multi-resolution probabilistic method for matching a database document to a degraded query image (for instance, taken from a low quality camera in bad illumination and even with a part of the document occluded with other objects.)...

    [...]

Frequently Asked Questions (13)
Q1. What have the authors contributed in "Multi-resolution probabilistic information fusion for camera-based document image matching" ?

The authors propose a novel multi-resolution robust methodology for the same. 

the features used for matching have to be either projective invariant, or estimating the homography between two projective planes. 

1) Selecting the Right Scale: Just as the resolution determines the number of pixels in a block (Sec: II-C1), it determines the number of contour extrema in a curve (contour) represented at different resolutions/scales. 

For the 20 images with occlusions and structured noise, there were 7 failures either because the object was at the corner of a block(resulting in a wrong bounding quadrilateral), or resulted in more contour curvature extrema from the occluding object than from the actual text block. 

Given n blocks detected in the query image Q, the system forms hypothesis corresponding to the correct identity of each query block qj , 1 ≤ j ≤ n. 

For a query image, a common situation is to have a part of a document image taken by a common camera (a cellphone camera, for instance), and at an arbitrary orientation, and possibly in a region of bad illumination. 

there could be structured and/or unstructured noise in the image: imaging noise, or other objects occluding parts of the document. 

The authors model the probability of the block in question being dikj given that the authors have observed query image block qj , as follows:Pfl(dikj |qj) = 1− (1/R) ∑ r ρ(xr, σ1) (3)Here, ρ(x, σ) denotes the robust error norm [8], where σ is a scale factor:ρ(x, σ) = x2x2 + σ2 (4)The above summation is for all pixels r in the warped query block, with respect to the corresponding pixels in the database document block dikj , and R is the total number of such pixels. 

Using the features fl (which come from independent sources of measurement), the authors define the total probability of the query image block qj being block dikj asP (qj |dikj ) = ∏ l Pfl(qj |dikj ) (2)For their experiments, the authors use two features: the bounding quadrilateral around the text/image block (Sec. II-C) and the block contour envelope curvature extrema projective coordinates (Sec. II-D). 

The fundamental theorem of Plane Projective Geometry (extensively cited in [7]) relates any two planes in higher dimensional space using a 2-D projective transform. 

Sections II-C and II-D consider the two features used in this work namely, text/image blocks, and the extrema points of contour envelopes, and discuss issues related to handling these features at multiple levels of resolution. 

The database document images have maximum size at the highest resolution level of 2340×1700, and the corresponding maximum figure for query images is 2848× 1600.1) Experiments with large illumination variations: Fig. 1 shows an example of successful matching in spite of bad illumination conditions. 

For an image at a given scale, the authors use a simple sequential labelling-based segmentation algorithm to find the number of connected regions (blocks).