Multi-Resolution Probabilistic Information Fusion for Camera-based Document Image Matching
Summary (3 min read)
Introduction
- Document image retrieval includes problems such as queries about layout [3] and logos [4].
- The authors model distortions using projective invariants (cross ratios).
- The number of feature points is often too large, and the votingbased procedure takes too much running time.
- This simplifies the polynomial time complexity of a geometric hashing-based strategy to a linear one.
II. A ROBUST MULTI-RESOLUTION APPROACH WITH
- The authors first examine the wide variety in query images that can be submitted to the system (Sec. II-A).
- Sec. II-B considers fusion of probability estimates from multiple independent sources of measurement.
- Sections II-C and II-D consider the two features used in this work namely, text/image blocks, and the extrema points of contour envelopes, and discuss issues related to handling these features at multiple levels of resolution.
- Sec. II-E explains the preprocessing steps in order to extract these two features from a given document image.
A. Wide Variations in Query Images: Geometric Deformations, Illumination Variations, Noise
- Database images are generally taken in good imaging conditions with good and uniform illumination and zero skew.
- For a query image, a common situation is to have a part of a document image taken by a common camera (a cellphone camera, for instance), and at an arbitrary orientation, and possibly in a region of bad illumination.
- In general, the geometric deformation could be non-linear.
- The fundamental theorem of Plane Projective Geometry (extensively cited in [7]) relates any two planes in higher dimensional space using a 2-D projective transform.
- F (x, y) denotes the image intensity, the ∇ denotes the intensity gradient, W (x, y) is a local window centred at pixel (x, y) and c is a small positive constant used to avoid division by zero.
B. Multiple Sources of Measurement
- The proposed technique is independent of the specific sources of measurements for different features.
- This could correspond to any block dikj .
- Let Pfl(qj |dikj ) denote the probability of query image block qj corresponding to block dikj in document Di, obtained using feature fl.
- Sections II-C and II-D describe the computation of the corresponding Pfl(qj |dikj ) for the two cases, respectively.
C. Script-Independent Matching of Text/Image Blocks
- The first feature that the authors use are four corner points of the bounding quadrilateral of a text or an image block.
- (Section II-E outlines the basic pre-processing steps in their system).
- The advantage of taking ρ(x, σ) in place of x2 (or a normalised version of it, for that matter) is that the robust error norm is more robust to an outlier.
- The system starts at the smallest resolution.
D. Geometric Hashing-based Matching of Contour Envelope Curvature Extrema Projective Co-ordinates
- From the basic pre-processing steps of Sec. II-E, the second feature the authors use is the curvature extrema of the contour envelope.
- For an image at any level in the Gaussian pyramid, smearing results in a text block.
- The authors consider a hash table for both the database document block dikj ,as well as the query block qj .
- The authors can reduce this to linear, if each has table row is sorted.
- Hence, the problem of matching curvature extrema reduces to O(M5jN5j )× the row matching time.
E. Feature Extraction
- Both features in Sections II-C and II-D have a common processing pipeline.
- Images are stored at multiple levels of resolution.
- The first step is the application of a run length smearing algorithm [9].
- For the first feature, the authors use a Hough Transform-based method to fit a quadrilateral around a text/image block provided it is greater than a particular size (this is again a scaledependent parameter).
- The authors do this only for blocks for which it is possible to fit four lines around it.
III. PROBABILISTIC HYPOTHESIS GENERATION
- The authors are given a query image Q (which contains n blocks qj).
- A query block qj could correspond to a database document block dikj in document Di. Based on the features in Sections II-C and II-D, the authors compute the probability that a particular query image block qj corresponds to database document block dikj as follows.
- For a system with l features f1, f2 . . . fl, the authors compute this probability as P (dikj |qj) = ∏ l Pfl(dikj |qj) (6) This is reasonable, since they assume that the l features and their measurement processes are independent.
- The authors note that while all query image blocks have to correspond to one document Di, one may have more than one hypothesis corresponding to a document Di. Given that they have observed query image blocks q1 . . . qn, they compute the probability that these n blocks correspond to blocks di1 . . . din corresponding to document image.
IV. EXPERIMENTAL RESULTS AND DISCUSSION
- The authors have a set of 50 database document images, and 100 query images.
- The database document images have maximum size at the highest resolution level of 2340×1700, and the corresponding maximum figure for query images is 2848× 1600.
- 2) Highly skewed query image: Fig. 2 shows an example of successful matching in spite of a large amount of skew in the query image.
- Some statistics for the above cases are as follows.
- For the 20 images with occlusions and structured noise, there were 7 failures either because the object was at the corner of a block (resulting in a wrong bounding quadrilateral), or resulted in more contour curvature extrema from the occluding object than from the actual text block.
Did you find this useful? Give us your feedback
References
121 citations
"Multi-Resolution Probabilistic Info..." refers background in this paper
...The Kise group extend their earlier ideas in [1] and experiment with affine and projective models....
[...]
105 citations
34 citations
"Multi-Resolution Probabilistic Info..." refers background in this paper
...Our system does not consider database organisation issues which affect efficiency in image retrieval: we present a novel approach to the image matching problem....
[...]
27 citations
"Multi-Resolution Probabilistic Info..." refers background in this paper
...To the best of our knowledge, no relevant work addresses all these issues....
[...]
26 citations
"Multi-Resolution Probabilistic Info..." refers background or methods in this paper
...Since this relies on the layout of words, it fails when there is a small amount of text present in captured image....
[...]
...…Probabilistic Information Fusion I. INTRODUCTION We present a novel multi-resolution probabilistic method for matching a database document to a degraded query image (for instance, taken from a low quality camera in bad illumination and even with a part of the document occluded with other objects.)...
[...]
Related Papers (5)
Frequently Asked Questions (13)
Q2. What is the basic theorem of Plane Projective Geometry?
the features used for matching have to be either projective invariant, or estimating the homography between two projective planes.
Q3. What is the procedure for determining the right scale?
1) Selecting the Right Scale: Just as the resolution determines the number of pixels in a block (Sec: II-C1), it determines the number of contour extrema in a curve (contour) represented at different resolutions/scales.
Q4. What was the result of the failures of the 20 images?
For the 20 images with occlusions and structured noise, there were 7 failures either because the object was at the corner of a block(resulting in a wrong bounding quadrilateral), or resulted in more contour curvature extrema from the occluding object than from the actual text block.
Q5. How many blocks are detected in the query image?
Given n blocks detected in the query image Q, the system forms hypothesis corresponding to the correct identity of each query block qj , 1 ≤ j ≤ n.
Q6. What is the common situation for a query image?
For a query image, a common situation is to have a part of a document image taken by a common camera (a cellphone camera, for instance), and at an arbitrary orientation, and possibly in a region of bad illumination.
Q7. What could be the source of noise in the image?
there could be structured and/or unstructured noise in the image: imaging noise, or other objects occluding parts of the document.
Q8. What is the probability of a block being a dikj?
The authors model the probability of the block in question being dikj given that the authors have observed query image block qj , as follows:Pfl(dikj |qj) = 1− (1/R) ∑ r ρ(xr, σ1) (3)Here, ρ(x, σ) denotes the robust error norm [8], where σ is a scale factor:ρ(x, σ) = x2x2 + σ2 (4)The above summation is for all pixels r in the warped query block, with respect to the corresponding pixels in the database document block dikj , and R is the total number of such pixels.
Q9. What is the probability of a query image block being a dikj?
Using the features fl (which come from independent sources of measurement), the authors define the total probability of the query image block qj being block dikj asP (qj |dikj ) = ∏ l Pfl(qj |dikj ) (2)For their experiments, the authors use two features: the bounding quadrilateral around the text/image block (Sec. II-C) and the block contour envelope curvature extrema projective coordinates (Sec. II-D).
Q10. What is the fundamental theorem of Plane Projective Geometry?
The fundamental theorem of Plane Projective Geometry (extensively cited in [7]) relates any two planes in higher dimensional space using a 2-D projective transform.
Q11. What are the two features used in this work?
Sections II-C and II-D consider the two features used in this work namely, text/image blocks, and the extrema points of contour envelopes, and discuss issues related to handling these features at multiple levels of resolution.
Q12. What is the probability of a query image block qj being matched to a?
The database document images have maximum size at the highest resolution level of 2340×1700, and the corresponding maximum figure for query images is 2848× 1600.1) Experiments with large illumination variations: Fig. 1 shows an example of successful matching in spite of bad illumination conditions.
Q13. What is the procedure for finding the number of connected regions?
For an image at a given scale, the authors use a simple sequential labelling-based segmentation algorithm to find the number of connected regions (blocks).