scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

An Overview of the Tesseract OCR Engine

Ray Smith1
23 Sep 2007-Vol. 2, pp 629-633
TL;DR: The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview.
Abstract: The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at least unusual in an OCR engine, including in particular the line finding, features/classification methods, and the adaptive classifier.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
16 May 2010
TL;DR: The main claim is that the task of finding attacks is fundamentally different from these other applications, making it significantly harder for the intrusion detection community to employ machine learning effectively.
Abstract: In network intrusion detection research, one popular strategy for finding attacks is monitoring a network's activity for anomalies: deviations from profiles of normality previously learned from benign traffic, typically identified using tools borrowed from the machine learning community However, despite extensive academic research one finds a striking gap in terms of actual deployments of such systems: compared with other intrusion detection approaches, machine learning is rarely employed in operational "real world" settings We examine the differences between the network intrusion detection problem and other areas where machine learning regularly finds much more success Our main claim is that the task of finding attacks is fundamentally different from these other applications, making it significantly harder for the intrusion detection community to employ machine learning effectively We support this claim by identifying challenges particular to network intrusion detection, and provide a set of guidelines meant to strengthen future research on anomaly detection

1,377 citations

Journal ArticleDOI
TL;DR: A large dataset of 227,835 imaging studies for 65,379 patients presenting to the Beth Israel Deaconess Medical Center Emergency Department between 2011–2016 is described, making freely available to facilitate and encourage a wide range of research in computer vision, natural language processing, and clinical data mining.
Abstract: Chest radiography is an extremely powerful imaging modality, allowing for a detailed inspection of a patient's chest, but requires specialized training for proper interpretation. With the advent of high performance general purpose computer vision algorithms, the accurate automated analysis of chest radiographs is becoming increasingly of interest to researchers. Here we describe MIMIC-CXR, a large dataset of 227,835 imaging studies for 65,379 patients presenting to the Beth Israel Deaconess Medical Center Emergency Department between 2011-2016. Each imaging study can contain one or more images, usually a frontal view and a lateral view. A total of 377,110 images are available in the dataset. Studies are made available with a semi-structured free-text radiology report that describes the radiological findings of the images, written by a practicing radiologist contemporaneously during routine clinical care. All images and reports have been de-identified to protect patient privacy. The dataset is made freely available to facilitate and encourage a wide range of research in computer vision, natural language processing, and clinical data mining.

504 citations

Proceedings ArticleDOI
15 Jun 2019
TL;DR: A novel model architecture is introduced that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or composed of the strings found in the images.
Abstract: Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today’s VQA models can not read! Our paper takes a first step towards addressing this problem. First, we introduce a new “TextVQA” dataset to facilitate progress on this important problem. Existing datasets either have a small proportion of questions about text (e.g., the VQA dataset) or are too small (e.g., the VizWiz dataset). TextVQA contains 45,336 questions on 28,408 images that require reasoning about text to answer. Second, we introduce a novel model architecture that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or composed of the strings found in the image. Consequently, we call our approach Look, Read, Reason & Answer (LoRRA). We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset. We find that the gap between human performance and machine performance is significantly larger on TextVQA than on VQA 2.0, suggesting that TextVQA is well-suited to benchmark progress along directions complementary to VQA 2.0.

363 citations

Journal ArticleDOI
TL;DR: A new framework to detect text strings with arbitrary orientations in complex natural scene images with outperform the state-of-the-art results on the public Robust Reading Dataset, which contains text only in horizontal orientation.
Abstract: Text information in natural scene images serves as important clues for many image-based applications such as scene understanding, content-based image retrieval, assistive navigation, and automatic geocoding. However, locating text from a complex background with multiple colors is a challenging task. In this paper, we explore a new framework to detect text strings with arbitrary orientations in complex natural scene images. Our proposed framework of text string detection consists of two steps: 1) image partition to find text character candidates based on local gradient features and color uniformity of character components and 2) character candidate grouping to detect text strings based on joint structural features of text characters in each text string such as character size differences, distances between neighboring characters, and character alignment. By assuming that a text string has at least three characters, we propose two algorithms of text string detection: 1) adjacent character grouping method and 2) text line grouping method. The adjacent character grouping method calculates the sibling groups of each character candidate as string segments and then merges the intersecting sibling groups into text string. The text line grouping method performs Hough transform to fit text line among the centroids of text candidates. Each fitted text line describes the orientation of a potential text string. The detected text string is presented by a rectangle region covering all characters whose centroids are cascaded in its text line. To improve efficiency and accuracy, our algorithms are carried out in multi-scales. The proposed methods outperform the state-of-the-art results on the public Robust Reading Dataset, which contains text only in horizontal orientation. Furthermore, the effectiveness of our methods to detect text strings with arbitrary orientations is evaluated on the Oriented Scene Text Dataset collected by ourselves containing text strings in nonhorizontal orientations.

355 citations

Journal ArticleDOI
TL;DR: Experiments show that the proposed solution outperforms many previous solutions, and LPR can be better solved by solutions with settings oriented for different applications.
Abstract: We split the applications of vehicle license plate recognition (LPR) into three major categories and propose a solution with parameter settings that are adjustable for different applications. The three categories are access control (AC), law enforcement (LE), and road patrol (RP). Each application is characterized by variables of different variation scopes and thus requires different settings on the solution with which to deal. The proposed solution consists of three modules for plate detection, character segmentation, and recognition. Edge clustering is formulated for solving plate detection for the first time. It is also a novel application of the maximally stable extreme region (MSER) detector to character segmentation. A bilayer classifier, which is improved with an additional null class, is experimentally proven to be better than previous methods for character recognition. To assess the performance of the proposed solution, the application-oriented license plate (AOLP) database is composed and made available to the research community. Experiments show that the proposed solution outperforms many previous solutions, and LPR can be better solved by solutions with settings oriented for different applications.

253 citations

References
More filters
Book
01 Jan 1987
TL;DR: This paper presents the results of a two-year study of the statistical treatment of outliers in the context of one-Dimensional Location and its applications to discrete-time reinforcement learning.
Abstract: 1. Introduction. 2. Simple Regression. 3. Multiple Regression. 4. The Special Case of One-Dimensional Location. 5. Algorithms. 6. Outlier Diagnostics. 7. Related Statistical Techniques. References. Table of Data Sets. Index.

6,955 citations


"An Overview of the Tesseract OCR En..." refers methods in this paper

  • ...Once the filtered blobs have been assigned to lines, a least median of squares fit [ 4 ] is used to estimate the baselines, and the filtered-out blobs are fitted back into the appropriate lines....

    [...]

01 Jan 1995
TL;DR: The annual test of optical character recognition systems known as “page readers” accepts as input a bitmapped image of any document page, and attempts to identify the machine-printed characters on the page.
Abstract: For four years, ISRI has conducted an annual test of optical character recognition (OCR) systems known as “page readers.” These systems accept as input a bitmapped image of any document page, and attempt to identify the machine-printed characters on the page. In the annual test, we measure the accuracy of this process by comparing the text that is produced as output with the correct text. The goals of the test include:

201 citations


"An Overview of the Tesseract OCR En..." refers background or methods in this paper

  • ...The engine was sent to UNLV for the 1995 Annual Test of OCR Accuracy[ 1 ], where it proved its worth against the commercial engines of the time....

    [...]

  • ...Like a supernova, it appeared from nowhere for the 1995 UNLV Annual Test of OCR Accuracy [ 1 ], shone brightly with its results, and then vanished back under the same cloak of secrecy under which it had been developed....

    [...]

  • ...Prototype in the UNLV Fourth Annual Test of OCR Accuracy[ 1 ], is described in a comprehensive overview....

    [...]

  • ...[ 1 ] More up-to-date results are at http://code.google.com/p/tesseract-ocr....

    [...]

  • ...[ 1 ] of OCR accuracy, as “HP Labs OCR,” but the code has changed a lot since then, including conversion to Unicode and retraining....

    [...]

Book ChapterDOI

141 citations


"An Overview of the Tesseract OCR En..." refers background in this paper

  • ...A more traditional cubic spline [6] might work better....

    [...]

Book
31 May 1999
TL;DR: A perspective on the performance of current OCR systems is offered by illustrating and explaining actual OCR errors made by three commercial devices, and possible approaches for improving the accuracy of today's systems are pointed to.
Abstract: Optical character recognition (OCR) is the most prominent and successful example of pattern recognition to date. There are thousands of research papers and dozens of OCR products. Optical Character Rcognition: An Illustrated Guide to the Frontier offers a perspective on the performance of current OCR systems by illustrating and explaining actual OCR errors. The pictures and analysis provide insight into the strengths and weaknesses of current OCR systems, and a road map to future progress. Optical Character Recognition: An Illustrated Guide to the Frontier will pique the interest of users and developers of OCR products and desktop scanners, as well as teachers and students of pattern recognition, artificial intelligence, and information retrieval. The first chapter compares the character recognition abilities of humans and computers. The next four chapters present 280 illustrated examples of recognition errors, in a taxonomy consisting of Imaging Defects, Similar Symbols, Punctuation, and Typography. These examples were drawn from large-scale tests conducted by the authors. The final chapter discusses possible approaches for improving the accuracy of today's systems, and is followed by an annotated bibliography. Optical Character Recognition: An Illustrated Guide to the Frontier is suitable as a secondary text for a graduate level course on pattern recognition, artificial intelligence, and information retrieval, and as a reference for researchers and practitioners in industry.

129 citations

Journal ArticleDOI
01 Jul 1992
TL;DR: It is argued that it is time for a major change of approach to optical character recognition (OCR) research, and new OCR systems should take advantage of the typographic uniformity of paragraphs or other layout components.
Abstract: It is argued that it is time for a major change of approach to optical character recognition (OCR) research. The traditional approach, focusing on the correct classification of isolated characters, has been exhausted. The demonstration of the superiority of a new classification method under operational conditions requires large experimental facilities and databases beyond the resources of most researchers. In any case, even perfect classification of individual characters is insufficient for the conversion of complex archival documents to a useful computer-readable form. Many practical OCR tasks require integrated treatment of entire documents and well-organized typographic and domain-specific knowledge. New OCR systems should take advantage of the typographic uniformity of paragraphs or other layout components. They should also exploit the unavoidable interaction with human operators to improve themselves without explicit 'training'. >

119 citations


"An Overview of the Tesseract OCR En..." refers background in this paper

  • ...It has been suggested [11] and demonstrated [12] that OCR engines can benefit from the use of an adaptive classifier....

    [...]