Journal ArticleDOI
Document Image Retrieval through Word Shape Coding
Reads0
Chats0
TLDR
The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code.Abstract:
This paper presents a document retrieval technique that is capable of searching document images without optical character recognition (OCR). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.read more
Citations
More filters
Journal ArticleDOI
Word Spotting and Recognition with Embedded Attributes
TL;DR: An approach in which both word images and text strings are embedded in a common vectorial subspace, allowing one to cast recognition and retrieval tasks as a nearest neighbor problem and is very fast to compute and, especially, to compare.
Journal ArticleDOI
Mobile Visual Location Recognition
Georg Schroth,Robert Huitl,David Chen,Mohammad Abu-Alqumsan,Anas Al-Nuaimi,Eckehard Steinbach +5 more
TL;DR: Video recordings of a mobile device as a visual fingerprint of the environment and matching them to a georeferenced database provides pose information in a very natural way and can be provided without complex infrastructure in areas where the accuracy and availability of GPS is limited.
Journal ArticleDOI
Scene Text Recognition in Mobile Applications by Character Descriptor and Structure Configuration
Chucai Yi,Yingli Tian +1 more
TL;DR: A method of scene text recognition from detected text regions using a discriminative character descriptor by combining several state-of-the-art feature detectors and descriptors is proposed.
Journal ArticleDOI
Word spotting in historical printed documents using shape and sequence comparisons
TL;DR: This work presents a word spotting method for scanned documents in order to find the word images that are similar to a query word, without assuming a correct segmentation of the words into characters.
Proceedings ArticleDOI
Keyword Spotting in Document Images through Word Shape Coding
TL;DR: The proposed technique is tolerant to serifs,font styles and certain degrees of touching, broken or overlapping characters, and improves over previous works with not only better precision and lower collision rate, but more importantly, the ability for partial matching.
References
More filters
Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Proceedings ArticleDOI
A re-examination of text categorization methods
Yiming Yang,Xin Liu +1 more
TL;DR: The results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small, and that all the methods perform comparably when the categories are over 300 instances.
Journal ArticleDOI
Content-based multimedia information retrieval: State of the art and challenges
TL;DR: This survey reviews 100+ recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques.
Journal ArticleDOI
Evaluation of binarization methods for document images
O.D. Trier,Torfinn Taxt +1 more
TL;DR: This paper presents an evaluation of eleven locally adaptive binarization methods for gray scale images with low contrast, variable background intensity and noise and Niblack's method with the addition of the postprocessing step of Yanowitz and Bruckstein's method (1989) performed the best and was also one of the fastest binarized methods.