An Overview of the Tesseract OCR Engine

doi:10.1109/ICDAR.2007.4376991

Home
/
Papers
/
An Overview of the Tesseract OCR Engine

Proceedings Article•DOI•

An Overview of the Tesseract OCR Engine

Ray Smith¹•Institutions (1)

Google¹

23 Sep 2007-Vol. 2, pp 629-633

TL;DR: The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview.

read less

Abstract: The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at least unusual in an OCR engine, including in particular the line finding, features/classification methods, and the adaptive classifier.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Outside the Closed World: On Using Machine Learning for Network Intrusion Detection

[...]

Robin Sommer¹, Vern Paxson²•Institutions (2)

Lawrence Berkeley National Laboratory¹, University of California, Berkeley²

16 May 2010

TL;DR: The main claim is that the task of finding attacks is fundamentally different from these other applications, making it significantly harder for the intrusion detection community to employ machine learning effectively.

...read moreread less

Abstract: In network intrusion detection research, one popular strategy for finding attacks is monitoring a network's activity for anomalies: deviations from profiles of normality previously learned from benign traffic, typically identified using tools borrowed from the machine learning community However, despite extensive academic research one finds a striking gap in terms of actual deployments of such systems: compared with other intrusion detection approaches, machine learning is rarely employed in operational "real world" settings We examine the differences between the network intrusion detection problem and other areas where machine learning regularly finds much more success Our main claim is that the task of finding attacks is fundamentally different from these other applications, making it significantly harder for the intrusion detection community to employ machine learning effectively We support this claim by identifying challenges particular to network intrusion detection, and provide a set of guidelines meant to strengthen future research on anomaly detection

...read moreread less

1,377 citations

Journal Article•DOI•

MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports

[...]

Alistair E. W. Johnson¹, Tom J. Pollard¹, Seth A. Berkowitz², Nathaniel R. Greenbaum², Matthew P. Lungren³, Chih-Ying Deng⁴, Roger G. Mark¹, Steven Horng² - Show less +4 more•Institutions (4)

Massachusetts Institute of Technology¹, Beth Israel Deaconess Medical Center², Stanford University³, Harvard University⁴

12 Dec 2019-Scientific Data

TL;DR: A large dataset of 227,835 imaging studies for 65,379 patients presenting to the Beth Israel Deaconess Medical Center Emergency Department between 2011–2016 is described, making freely available to facilitate and encourage a wide range of research in computer vision, natural language processing, and clinical data mining.

...read moreread less

Abstract: Chest radiography is an extremely powerful imaging modality, allowing for a detailed inspection of a patient's chest, but requires specialized training for proper interpretation. With the advent of high performance general purpose computer vision algorithms, the accurate automated analysis of chest radiographs is becoming increasingly of interest to researchers. Here we describe MIMIC-CXR, a large dataset of 227,835 imaging studies for 65,379 patients presenting to the Beth Israel Deaconess Medical Center Emergency Department between 2011-2016. Each imaging study can contain one or more images, usually a frontal view and a lateral view. A total of 377,110 images are available in the dataset. Studies are made available with a semi-structured free-text radiology report that describes the radiological findings of the images, written by a practicing radiologist contemporaneously during routine clinical care. All images and reports have been de-identified to protect patient privacy. The dataset is made freely available to facilitate and encourage a wide range of research in computer vision, natural language processing, and clinical data mining.

...read moreread less

504 citations

Proceedings Article•DOI•

Towards VQA Models That Can Read

[...]

Amanpreet Singh¹, Vivek T. Natarajan¹, Meet Shah¹, Yu Jiang¹, Xinlei Chen¹, Dhruv Batra², Devi Parikh², Marcus Rohrbach¹ - Show less +4 more•Institutions (2)

Facebook¹, Georgia Institute of Technology²

15 Jun 2019

TL;DR: A novel model architecture is introduced that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or composed of the strings found in the images.

...read moreread less

Abstract: Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today’s VQA models can not read! Our paper takes a first step towards addressing this problem. First, we introduce a new “TextVQA” dataset to facilitate progress on this important problem. Existing datasets either have a small proportion of questions about text (e.g., the VQA dataset) or are too small (e.g., the VizWiz dataset). TextVQA contains 45,336 questions on 28,408 images that require reasoning about text to answer. Second, we introduce a novel model architecture that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or composed of the strings found in the image. Consequently, we call our approach Look, Read, Reason & Answer (LoRRA). We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset. We find that the gap between human performance and machine performance is significantly larger on TextVQA than on VQA 2.0, suggesting that TextVQA is well-suited to benchmark progress along directions complementary to VQA 2.0.

...read moreread less

363 citations

Journal Article•DOI•

Text String Detection From Natural Scenes by Structure-Based Partition and Grouping

[...]

Chucai Yi¹, Yingli Tian²•Institutions (2)

City University of New York¹, IBM²

01 Sep 2011-IEEE Transactions on Image Processing

TL;DR: A new framework to detect text strings with arbitrary orientations in complex natural scene images with outperform the state-of-the-art results on the public Robust Reading Dataset, which contains text only in horizontal orientation.

...read moreread less

Abstract: Text information in natural scene images serves as important clues for many image-based applications such as scene understanding, content-based image retrieval, assistive navigation, and automatic geocoding. However, locating text from a complex background with multiple colors is a challenging task. In this paper, we explore a new framework to detect text strings with arbitrary orientations in complex natural scene images. Our proposed framework of text string detection consists of two steps: 1) image partition to find text character candidates based on local gradient features and color uniformity of character components and 2) character candidate grouping to detect text strings based on joint structural features of text characters in each text string such as character size differences, distances between neighboring characters, and character alignment. By assuming that a text string has at least three characters, we propose two algorithms of text string detection: 1) adjacent character grouping method and 2) text line grouping method. The adjacent character grouping method calculates the sibling groups of each character candidate as string segments and then merges the intersecting sibling groups into text string. The text line grouping method performs Hough transform to fit text line among the centroids of text candidates. Each fitted text line describes the orientation of a potential text string. The detected text string is presented by a rectangle region covering all characters whose centroids are cascaded in its text line. To improve efficiency and accuracy, our algorithms are carried out in multi-scales. The proposed methods outperform the state-of-the-art results on the public Robust Reading Dataset, which contains text only in horizontal orientation. Furthermore, the effectiveness of our methods to detect text strings with arbitrary orientations is evaluated on the Oriented Scene Text Dataset collected by ourselves containing text strings in nonhorizontal orientations.

...read moreread less

355 citations

Journal Article•DOI•

Application-Oriented License Plate Recognition

[...]

Gee-Sern Hsu¹, Jiun-Chang Chen, Yu-Zu Chung¹•Institutions (1)

National Taiwan University of Science and Technology¹

01 Feb 2013-IEEE Transactions on Vehicular Technology

TL;DR: Experiments show that the proposed solution outperforms many previous solutions, and LPR can be better solved by solutions with settings oriented for different applications.

...read moreread less

Abstract: We split the applications of vehicle license plate recognition (LPR) into three major categories and propose a solution with parameter settings that are adjustable for different applications. The three categories are access control (AC), law enforcement (LE), and road patrol (RP). Each application is characterized by variables of different variation scopes and thus requires different settings on the solution with which to deal. The proposed solution consists of three modules for plate detection, character segmentation, and recognition. Edge clustering is formulated for solving plate detection for the first time. It is also a novel application of the maximally stable extreme region (MSER) detector to character segmentation. A bilayer classifier, which is improved with an additional null class, is experimentally proven to be better than previous methods for character recognition. To assess the performance of the proposed solution, the application-oriented license plate (AOLP) database is composed and made available to the research community. Experiments show that the proposed solution outperforms many previous solutions, and LPR can be better solved by solutions with settings oriented for different applications.

...read moreread less

253 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

Robust Regression and Outlier Detection

[...]

Peter J. Rousseeuw, Annick M. Leroy

01 Jan 1987

TL;DR: This paper presents the results of a two-year study of the statistical treatment of outliers in the context of one-Dimensional Location and its applications to discrete-time reinforcement learning.

...read moreread less

Abstract: 1. Introduction. 2. Simple Regression. 3. Multiple Regression. 4. The Special Case of One-Dimensional Location. 5. Algorithms. 6. Outlier Diagnostics. 7. Related Statistical Techniques. References. Table of Data Sets. Index.

...read moreread less

6,955 citations

"An Overview of the Tesseract OCR En..." refers methods in this paper

...Once the filtered blobs have been assigned to lines, a least median of squares fit [ 4 ] is used to estimate the baselines, and the filtered-out blobs are fitted back into the appropriate lines....
[...]

The Fourth Annual Test of OCR Accuracy

[...]

Stephen V. Rice, Frank R. Jenkins, Thomas A. Nartker

01 Jan 1995

TL;DR: The annual test of optical character recognition systems known as “page readers” accepts as input a bitmapped image of any document page, and attempts to identify the machine-printed characters on the page.

...read moreread less

Abstract: For four years, ISRI has conducted an annual test of optical character recognition (OCR) systems known as “page readers.” These systems accept as input a bitmapped image of any document page, and attempt to identify the machine-printed characters on the page. In the annual test, we measure the accuracy of this process by comparing the text that is produced as output with the correct text. The goals of the test include:

...read moreread less

201 citations

"An Overview of the Tesseract OCR En..." refers background or methods in this paper

...The engine was sent to UNLV for the 1995 Annual Test of OCR Accuracy[ 1 ], where it proved its worth against the commercial engines of the time....
[...]
...Like a supernova, it appeared from nowhere for the 1995 UNLV Annual Test of OCR Accuracy [ 1 ], shone brightly with its results, and then vanished back under the same cloak of secrecy under which it had been developed....
[...]
...Prototype in the UNLV Fourth Annual Test of OCR Accuracy[ 1 ], is described in a comprehensive overview....
[...]
...[ 1 ] More up-to-date results are at http://code.google.com/p/tesseract-ocr....
[...]
...[ 1 ] of OCR accuracy, as “HP Labs OCR,” but the code has changed a lot since then, including conversion to Unicode and retraining....
[...]

Book Chapter•DOI•

An algorithm for automatically fitting digitized curves

[...]

Philip J. Schneider¹•Institutions (1)

University of Geneva¹

01 Aug 1990-Graphics gems

141 citations

"An Overview of the Tesseract OCR En..." refers background in this paper

...A more traditional cubic spline [6] might work better....
[...]

Book•

Optical Character Recognition: An Illustrated Guide to the Frontier

[...]

Stephen V. Rice, George Nagy¹, Thomas A. Nartker²•Institutions (2)

Rensselaer Polytechnic Institute¹, University of Nevada, Las Vegas²

31 May 1999

TL;DR: A perspective on the performance of current OCR systems is offered by illustrating and explaining actual OCR errors made by three commercial devices, and possible approaches for improving the accuracy of today's systems are pointed to.

...read moreread less

Abstract: Optical character recognition (OCR) is the most prominent and successful example of pattern recognition to date. There are thousands of research papers and dozens of OCR products. Optical Character Rcognition: An Illustrated Guide to the Frontier offers a perspective on the performance of current OCR systems by illustrating and explaining actual OCR errors. The pictures and analysis provide insight into the strengths and weaknesses of current OCR systems, and a road map to future progress. Optical Character Recognition: An Illustrated Guide to the Frontier will pique the interest of users and developers of OCR products and desktop scanners, as well as teachers and students of pattern recognition, artificial intelligence, and information retrieval. The first chapter compares the character recognition abilities of humans and computers. The next four chapters present 280 illustrated examples of recognition errors, in a taxonomy consisting of Imaging Defects, Similar Symbols, Punctuation, and Typography. These examples were drawn from large-scale tests conducted by the authors. The final chapter discusses possible approaches for improving the accuracy of today's systems, and is followed by an annotated bibliography. Optical Character Recognition: An Illustrated Guide to the Frontier is suitable as a secondary text for a graduate level course on pattern recognition, artificial intelligence, and information retrieval, and as a reference for researchers and practitioners in industry.

...read moreread less

129 citations

Journal Article•DOI•

At the frontiers of OCR

[...]

George Nagy¹•Institutions (1)

Rensselaer Polytechnic Institute¹

01 Jul 1992

TL;DR: It is argued that it is time for a major change of approach to optical character recognition (OCR) research, and new OCR systems should take advantage of the typographic uniformity of paragraphs or other layout components.

...read moreread less

Abstract: It is argued that it is time for a major change of approach to optical character recognition (OCR) research. The traditional approach, focusing on the correct classification of isolated characters, has been exhausted. The demonstration of the superiority of a new classification method under operational conditions requires large experimental facilities and databases beyond the resources of most researchers. In any case, even perfect classification of individual characters is insufficient for the conversion of complex archival documents to a useful computer-readable form. Many practical OCR tasks require integrated treatment of entire documents and well-organized typographic and domain-specific knowledge. New OCR systems should take advantage of the typographic uniformity of paragraphs or other layout components. They should also exploit the unavoidable interaction with human operators to improve themselves without explicit 'training'. >

...read moreread less

119 citations