scispace - formally typeset
Journal ArticleDOI

Document Image Retrieval through Word Shape Coding

Reads0
Chats0
TLDR
The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code.
Abstract
This paper presents a document retrieval technique that is capable of searching document images without optical character recognition (OCR). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.

read more

Citations
More filters
Journal ArticleDOI

Word Spotting and Recognition with Embedded Attributes

TL;DR: An approach in which both word images and text strings are embedded in a common vectorial subspace, allowing one to cast recognition and retrieval tasks as a nearest neighbor problem and is very fast to compute and, especially, to compare.
Journal ArticleDOI

Mobile Visual Location Recognition

TL;DR: Video recordings of a mobile device as a visual fingerprint of the environment and matching them to a georeferenced database provides pose information in a very natural way and can be provided without complex infrastructure in areas where the accuracy and availability of GPS is limited.
Journal ArticleDOI

Scene Text Recognition in Mobile Applications by Character Descriptor and Structure Configuration

TL;DR: A method of scene text recognition from detected text regions using a discriminative character descriptor by combining several state-of-the-art feature detectors and descriptors is proposed.
Journal ArticleDOI

Word spotting in historical printed documents using shape and sequence comparisons

TL;DR: This work presents a word spotting method for scanned documents in order to find the word images that are similar to a query word, without assuming a correct segmentation of the words into characters.
Proceedings ArticleDOI

Keyword Spotting in Document Images through Word Shape Coding

TL;DR: The proposed technique is tolerant to serifs,font styles and certain degrees of touching, broken or overlapping characters, and improves over previous works with not only better precision and lower collision rate, but more importantly, the ability for partial matching.
References
More filters
Book

Introduction to Modern Information Retrieval

TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Proceedings ArticleDOI

A re-examination of text categorization methods

TL;DR: The results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small, and that all the methods perform comparably when the categories are over 300 instances.
Journal ArticleDOI

Content-based multimedia information retrieval: State of the art and challenges

TL;DR: This survey reviews 100+ recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques.
Journal ArticleDOI

Evaluation of binarization methods for document images

TL;DR: This paper presents an evaluation of eleven locally adaptive binarization methods for gray scale images with low contrast, variable background intensity and noise and Niblack's method with the addition of the postprocessing step of Yanowitz and Bruckstein's method (1989) performed the best and was also one of the fastest binarized methods.