Word spotting for historical documents

doi:10.1007/S10032-006-0035-8

Home
/
Papers
/
Word spotting for historical documents

Journal Article•DOI•

Word spotting for historical documents

Toni M. Rath¹, R. Manmatha¹•Institutions (1)

University of Massachusetts Amherst¹

04 Apr 2007-International Journal on Document Analysis and Recognition (Springer-Verlag)-Vol. 9, Iss: 2, pp 299-299

TL;DR: It is shown in a subset of the George Washington collection that such a word spotting technique can outperform a Hidden Markov Model word-based recognition technique in terms of word error rates.

read less

Abstract: Searching and indexing historical handwritten collections are a very challenging problem. We describe an approach called word spotting which involves grouping word images into clusters of similar words by using image matching to find similarity. By annotating “interesting” clusters, an index that links words to the locations where they occur can be built automatically. Image similarities computed using a number of different techniques including dynamic time warping are compared. The word similarities are then used for clustering using both K-means and agglomerative clustering techniques. It is shown in a subset of the George Washington collection that such a word spotting technique can outperform a Hidden Markov Model word-based recognition technique in terms of word error rates.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Reading Text in the Wild with Convolutional Neural Networks

[...]

Max Jaderberg¹, Karen Simonyan¹, Andrea Vedaldi¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2016-International Journal of Computer Vision

TL;DR: An end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval and a real-world application to allow thousands of hours of news footage to be instantly searchable via a text query is demonstrated.

...read moreread less

Abstract: In this work we present an end-to-end system for text spotting--localising and recognising text in natural scene images--and text based image retrieval. This system is based on a region proposal mechanism for detection and deep convolutional neural networks for recognition. Our pipeline uses a novel combination of complementary proposal generation techniques to ensure high recall, and a fast subsequent filtering stage for improving precision. For the recognition and ranking of proposals, we train very large convolutional neural networks to perform word recognition on the whole proposal region at the same time, departing from the character classifier based systems of the past. These networks are trained solely on data produced by a synthetic text generation engine, requiring no human labelled data. Analysing the stages of our pipeline, we show state-of-the-art performance throughout. We perform rigorous experiments across a number of standard end-to-end text spotting benchmarks and text-based image retrieval datasets, showing a large improvement over all previous methods. Finally, we demonstrate a real-world application of our text spotting system to allow thousands of hours of news footage to be instantly searchable via a text query.

...read moreread less

1,054 citations

Cites background or methods from "Word spotting for historical docume..."

...This is word detection – in an ideal scenario we would be able to generate word bounding boxes with high recall and high precision, achieving this by extracting the maximum amount of information from each bounding box candidate possible....
[...]
...Our process loosely follows the detection/recognition separation – a word detection stage followed by a word recognition stage....
[...]

Book Chapter•DOI•

Deep Features for Text Spotting

[...]

Max Jaderberg¹, Andrea Vedaldi¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

06 Sep 2014

TL;DR: A Convolutional Neural Network classifier is developed that can be used for text spotting in natural images and a method of automated data mining of Flickr, that generates word and character level annotations is used to form an end-to-end, state-of-the-art text spotting system.

...read moreread less

Abstract: The goal of this work is text spotting in natural images. This is divided into two sequential tasks: detecting words regions in the image, and recognizing the words within these regions. We make the following contributions: first, we develop a Convolutional Neural Network (CNN) classifier that can be used for both tasks. The CNN has a novel architecture that enables efficient feature sharing (by using a number of layers in common) for text detection, character case-sensitive and insensitive classification, and bigram classification. It exceeds the state-of-the-art performance for all of these. Second, we make a number of technical changes over the traditional CNN architectures, including no downsampling for a per-pixel sliding window, and multi-mode learning with a mixture of linear models (maxout). Third, we have a method of automated data mining of Flickr, that generates word and character level annotations. Finally, these components are used together to form an end-to-end, state-of-the-art text spotting system. We evaluate the text-spotting system on two standard benchmarks, the ICDAR Robust Reading data set and the Street View Text data set, and demonstrate improvements over the state-of-the-art on multiple measures.

...read moreread less

681 citations

Cites background from "Word spotting for historical docume..."

...Authors have subsequently focused solely on text detection [7, 11, 16, 50, 51], or text recognition [31, 36, 41], or on combining both in end-to-end systems [40, 39, 49, 32–34, 45, 35, 6, 8, 48]....
[...]

Journal Article•DOI•

Word Spotting and Recognition with Embedded Attributes

[...]

Jon Almazan¹, Albert Gordo², Alicia Fornés¹, Ernest Valveny¹•Institutions (2)

Autonomous University of Barcelona¹, Xerox²

17 Jul 2014-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An approach in which both word images and text strings are embedded in a common vectorial subspace, allowing one to cast recognition and retrieval tasks as a nearest neighbor problem and is very fast to compute and, especially, to compare.

...read moreread less

Abstract: This paper addresses the problems of word spotting and word recognition on images. In word spotting, the goal is to find all instances of a query word in a dataset of images. In recognition, the goal is to recognize the content of the word image, usually aided by a dictionary or lexicon. We describe an approach in which both word images and text strings are embedded in a common vectorial subspace. This is achieved by a combination of label embedding and attributes learning, and a common subspace regression. In this subspace, images and strings that represent the same word are close together, allowing one to cast recognition and retrieval tasks as a nearest neighbor problem. Contrary to most other existing methods, our representation has a fixed length, is low dimensional, and is very fast to compute and, especially, to compare. We test our approach on four public datasets of both handwritten documents and natural images showing results comparable or better than the state-of-the-art on spotting and recognition tasks.

...read moreread less

522 citations

Cites background from "Word spotting for historical docume..."

...TEXT understanding in images is an important problemthat has drawn a lot of attention from the computer vision community since its beginnings....
[...]
...The final PHOC histogram is the concatenation of these partial histograms....
[...]

Journal Article•DOI•

Lexicon-free handwritten word spotting using character HMMs

[...]

Andreas Fischer¹, Andreas Keller¹, Volkmar Frinken¹, Horst Bunke¹•Institutions (1)

University of Bern¹

01 May 2012-Pattern Recognition Letters

TL;DR: For a multi-writer scenario on the IAM off-line database as well as for two single writer scenarios on historical data sets, it is shown that the proposed learning-based system outperforms a standard template matching method.

...read moreread less

293 citations

Cites background or methods from "Word spotting for historical docume..."

...DTW-based keyword spotting was proposed in [28] for speech recognition and is also well-established in the field of handwritten word spotting [15, 16, 17, 18]....
[...]
...For more details on the DTW distance algorithm, we refer to [15]....
[...]
...The proposed system is compared with a well-established template matching method based on Dynamic Time Warping (DTW) [15]....
[...]
...The DTW distance DTW(X, Y) of the word images X and Y is then given by the minimum alignment cost that is found by means of dynamic programming [15]....
[...]
..., based on word profiles [15], closed contours [16], and local gradients [17, 18]....
[...]

Journal Article•DOI•

A Novel Word Spotting Method Based on Recurrent Neural Networks

[...]

Volkmar Frinken¹, Andreas Fischer¹, R. Manmatha², Horst Bunke¹•Institutions (2)

University of Bern¹, University of Massachusetts Amherst²

01 Feb 2012-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel keyword spotting method for handwritten documents is described, derived from a neural network-based system for unconstrained handwriting recognition, that performs template-free spotting, i.e., it is not necessary for a keyword to appear in the training set.

...read moreread less

Abstract: Keyword spotting refers to the process of retrieving all instances of a given keyword from a document. In the present paper, a novel keyword spotting method for handwritten documents is described. It is derived from a neural network-based system for unconstrained handwriting recognition. As such it performs template-free spotting, i.e., it is not necessary for a keyword to appear in the training set. The keyword spotting is done using a modification of the CTC Token Passing algorithm in conjunction with a recurrent neural network. We demonstrate that the proposed systems outperform not only a classical dynamic time warping-based approach but also a modern keyword spotting system, based on hidden Markov models. Furthermore, we analyze the performance of the underlying neural networks when using them in a recognition task followed by keyword spotting on the produced transcription. We point out the advantages of keyword spotting when compared to classic text line recognition.

...read moreread less

283 citations

Cites background or methods from "Word spotting for historical docume..."

...Our DTW implementation, similarly to the one described in [4], makes use of a Sakoe-Chiba band [50] to speed up the computation....
[...]
...Comparing such sequences using dynamic time warping (DTW) is one of the most commonly used word spotting methods [21], [22] and is still widely used [4]....
[...]
...Certain efforts have already been put into word spotting for historical data [4], [5]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

A simplex method for function minimization

[...]

John A. Nelder, R. Mead¹•Institutions (1)

University of Warwick¹

01 Jan 1965-The Computer Journal

TL;DR: A method is described for the minimization of a function of n variables, which depends on the comparison of function values at the (n 41) vertices of a general simplex, followed by the replacement of the vertex with the highest value by another point.

...read moreread less

Abstract: A method is described for the minimization of a function of n variables, which depends on the comparison of function values at the (n 41) vertices of a general simplex, followed by the replacement of the vertex with the highest value by another point. The simplex adapts itself to the local landscape, and contracts on to the final minimum. The method is shown to be effective and computationally compact. A procedure is given for the estimation of the Hessian matrix in the neighbourhood of the minimum, needed in statistical estimation problems.

...read moreread less

27,271 citations

"Word spotting for historical docume..." refers methods in this paper

...The fitting was performed with the “Nelder-Mead” optimization procedure [20], which minimizes the sum of squared differences between the actual vocabulary sizes and the ones 26...
[...]

Book•

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

[...]

Trevor Hastie¹, Robert Tibshirani, Jerome H. Friedman•Institutions (1)

University of New South Wales¹

28 Jul 2013

TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.

...read moreread less

Abstract: During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

...read moreread less

19,261 citations

Journal Article•DOI•

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

[...]

David Ruppert

01 Jun 2004-Journal of the American Statistical Association

TL;DR: The Elements of Statistical Learning: Data Mining, Inference, and Prediction as discussed by the authors is a popular book for data mining and machine learning, focusing on data mining, inference, and prediction.

...read moreread less

Abstract: (2004). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Journal of the American Statistical Association: Vol. 99, No. 466, pp. 567-567.

...read moreread less

10,549 citations

Book•

Modern Information Retrieval

[...]

Ricardo Baeza-Yates, Berthier Ribeiro-Neto

15 May 1999

TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.

...read moreread less

Abstract: From the Publisher: This is a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective. The advent of the Internet and the enormous increase in volume of electronically stored information generally has led to substantial work on IR from the computer science perspective - this book provides an up-to-date student oriented treatment of the subject.

...read moreread less

9,923 citations

Journal Article•DOI•

Shape matching and object recognition using shape contexts

[...]

Serge Belongie¹, Jitendra Malik², J. Puzicha•Institutions (2)

University of California, San Diego¹, University of California, Berkeley²

01 Apr 2002-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper presents work on computing shape models that are computationally fast and invariant basic transformations like translation, scaling and rotation, and proposes shape detection using a feature called shape context, which is descriptive of the shape of the object.

...read moreread less

Abstract: We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework, the measurement of similarity is preceded by: (1) solving for correspondences between points on the two shapes; (2) using the correspondences to estimate an aligning transform. In order to solve the correspondence problem, we attach a descriptor, the shape context, to each point. The shape context at a reference point captures the distribution of the remaining points relative to it, thus offering a globally discriminative characterization. Corresponding points on two similar shapes will have similar shape contexts, enabling us to solve for correspondences as an optimal assignment problem. Given the point correspondences, we estimate the transformation that best aligns the two shapes; regularized thin-plate splines provide a flexible class of transformation maps for this purpose. The dissimilarity between the two shapes is computed as a sum of matching errors between corresponding points, together with a term measuring the magnitude of the aligning transform. We treat recognition in a nearest-neighbor classification framework as the problem of finding the stored prototype shape that is maximally similar to that in the image. Results are presented for silhouettes, trademarks, handwritten digits, and the COIL data set.

...read moreread less

6,693 citations