scispace - formally typeset
Search or ask a question
Book ChapterDOI

Word Spotting in Cluttered Environment

01 Jan 2020-Advances in intelligent systems and computing (Springer Science and Business Media Deutschland GmbH)-Vol. 1024, pp 161-172
TL;DR: This paper presents a novel problem of handwritten word spotting in cluttered environment where a word is cluttered by a strike-through with a line stroke, which is the combinatorics Vertical Projection Profile (cVPP) feature extracted and aligned by modified Dynamic Time Warping (DTW) algorithm.
Abstract: In this paper, we present a novel problem of handwritten word spotting in cluttered environment where a word is cluttered by a strike-through with a line stroke. These line strokes can be straight, slant, broken, continuous, or wavy in nature. Vertical Projection Profile (VPP) feature and its modified version, which is the combinatorics Vertical Projection Profile (cVPP) feature is extracted and aligned by modified Dynamic Time Warping (DTW) algorithm. The dataset for the proposed problem is not available so we prepared our dataset. We compare our method with Rath and Manmath [6], and PHOCNET [17] for handwritten word spotting in the presence of strike-through, and achieve better results.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper , a robust deep learning based framework for data exploration of ancient documents by applying Transfer Learning from the U-Net Network and using Pyramidal Histogram of Character Encryption is proposed.

4 citations

References
More filters
Proceedings Article
31 Jul 1994
TL;DR: Preliminary experiments with a dynamic programming approach to pattern detection in databases, based on the dynamic time warping technique used in the speech recognition field, are described.
Abstract: Knowledge discovery in databases presents many interesting challenges within the context of providing computer tools for exploring large data archives. Electronic data repositories are growing quickly and contain data from commercial, scientific, and other domains. Much of this data is inherently temporal, such as stock prices or NASA telemetry data. Detecting patterns in such data streams or time series is an important knowledge discovery task. This paper describes some preliminary experiments with a dynamic programming approach to the problem. The pattern detection algorithm is based on the dynamic time warping technique used in the speech recognition field.

3,229 citations

Proceedings ArticleDOI
18 Jun 2003
TL;DR: This work presents an algorithm for matching handwritten words in noisy historical documents that performs better and is faster than competing matching techniques and presents experimental results on two different data sets from the George Washington collection.
Abstract: Libraries and other institutions are interested in providing access to scanned versions of their large collections of handwritten historical manuscripts on electronic media. Convenient access to a collection requires an index, which is manually created at great labor and expense. Since current handwriting recognizers do not perform well on historical documents, a technique called word spotting has been developed: clusters with occurrences of the same word in a collection are established using image matching. By annotating "interesting" clusters, an index can be built automatically. We present an algorithm for matching handwritten words in noisy historical documents. The segmented word images are preprocessed to create sets of 1-dimensional features, which are then compared using dynamic time warping. We present experimental results on two different data sets from the George Washington collection. Our experiments show that this algorithm performs better and is faster than competing matching techniques.

626 citations

Journal ArticleDOI
TL;DR: It is shown in a subset of the George Washington collection that such a word spotting technique can outperform a Hidden Markov Model word-based recognition technique in terms of word error rates.
Abstract: Searching and indexing historical handwritten collections are a very challenging problem. We describe an approach called word spotting which involves grouping word images into clusters of similar words by using image matching to find similarity. By annotating “interesting” clusters, an index that links words to the locations where they occur can be built automatically. Image similarities computed using a number of different techniques including dynamic time warping are compared. The word similarities are then used for clustering using both K-means and agglomerative clustering techniques. It is shown in a subset of the George Washington collection that such a word spotting technique can outperform a Hidden Markov Model word-based recognition technique in terms of word error rates.

368 citations

Journal ArticleDOI
TL;DR: For a multi-writer scenario on the IAM off-line database as well as for two single writer scenarios on historical data sets, it is shown that the proposed learning-based system outperforms a standard template matching method.

293 citations