Document Image Retrieval through Word Shape Coding

doi:10.1109/TPAMI.2008.89

Journal ArticleDOI

Document Image Retrieval through Word Shape Coding

Shijian Lu, +2 more

- 01 Nov 2008 -

IEEE Transactions on Pattern Analysis an...

- Vol. 30, Iss: 11, pp 1913-1918

Chats0

TLDR

The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code.

Abstract:

This paper presents a document retrieval technique that is capable of searching document images without optical character recognition (OCR). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Word Spotting and Recognition with Embedded Attributes

Jon Almazan, +3 more

- 17 Jul 2014 -

IEEE Transactions on Pattern Analysis an...

TL;DR: An approach in which both word images and text strings are embedded in a common vectorial subspace, allowing one to cast recognition and retrieval tasks as a nearest neighbor problem and is very fast to compute and, especially, to compare.

...read moreread less

Journal ArticleDOI

Mobile Visual Location Recognition

Georg Schroth, +5 more

- 16 Jun 2011 -

IEEE Signal Processing Magazine

TL;DR: Video recordings of a mobile device as a visual fingerprint of the environment and matching them to a georeferenced database provides pose information in a very natural way and can be provided without complex infrastructure in areas where the accuracy and availability of GPS is limited.

...read moreread less

Journal ArticleDOI

Scene Text Recognition in Mobile Applications by Character Descriptor and Structure Configuration

Chucai Yi, +1 more

- 16 Apr 2014 -

IEEE Transactions on Image Processing

TL;DR: A method of scene text recognition from detected text regions using a discriminative character descriptor by combining several state-of-the-art feature detectors and descriptors is proposed.

...read moreread less

Journal ArticleDOI

Word spotting in historical printed documents using shape and sequence comparisons

Khurram Khurshid, +2 more

- 01 Jul 2012 -

Pattern Recognition

TL;DR: This work presents a word spotting method for scanned documents in order to find the word images that are similar to a query word, without assuming a correct segmentation of the words into characters.

...read moreread less

Proceedings ArticleDOI

Keyword Spotting in Document Images through Word Shape Coding

Shuyong Bai, +2 more

TL;DR: The proposed technique is tolerant to serifs,font styles and certain degrees of touching, broken or overlapping characters, and improves over previous works with not only better precision and lower collision rate, but more importantly, the ability for partial matching.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

A threshold selection method from gray level histograms

Nobuyuki Otsu

- 01 Jan 1979 -

IEEE Transactions on Systems, Man, and C...

Book

Introduction to Modern Information Retrieval

Gerard Salton, +1 more

TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.

...read moreread less

Proceedings ArticleDOI

A re-examination of text categorization methods

Yiming Yang, +1 more

TL;DR: The results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small, and that all the methods perform comparably when the categories are over 300 instances.

...read moreread less

Journal ArticleDOI

Content-based multimedia information retrieval: State of the art and challenges

Michael S. Lew, +3 more

- 01 Feb 2006 -

ACM Transactions on Multimedia Computing...

TL;DR: This survey reviews 100+ recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques.

...read moreread less

Journal ArticleDOI

Evaluation of binarization methods for document images

O.D. Trier, +1 more

- 01 Mar 1995 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This paper presents an evaluation of eleven locally adaptive binarization methods for gray scale images with low contrast, variable background intensity and noise and Niblack's method with the addition of the postprocessing step of Yanowitz and Bruckstein's method (1989) performed the best and was also one of the fastest binarized methods.

...read moreread less

Document Image Retrieval through Word Shape Coding

Citations

Word Spotting and Recognition with Embedded Attributes

Mobile Visual Location Recognition

Scene Text Recognition in Mobile Applications by Character Descriptor and Structure Configuration

Word spotting in historical printed documents using shape and sequence comparisons

Keyword Spotting in Document Images through Word Shape Coding

References

A threshold selection method from gray level histograms

Introduction to Modern Information Retrieval

A re-examination of text categorization methods

Content-based multimedia information retrieval: State of the art and challenges

Evaluation of binarization methods for document images

Related Papers (5)

Word spotting for historical documents

Word image matching using dynamic time warping

The Indexing and Retrieval of Document Images

Distinctive Image Features from Scale-Invariant Keypoints

Word spotting: a new approach to indexing handwriting