scispace - formally typeset
Search or ask a question
Topic

Optical character recognition

About: Optical character recognition is a research topic. Over the lifetime, 7342 publications have been published within this topic receiving 158193 citations. The topic is also known as: OCR & optical character reader.


Papers
More filters
Journal ArticleDOI
TL;DR: This work introduces a method that automatically computes a two-channel profile from an OCRed historical text and shows a strong correlation between the true distribution of spelling variation patterns and recognition errors in the OCRed text and estimated ranks and scores automatically computed in profiles.

33 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel, segmentation-free, fast and efficient technique for the detection and recognition of characters and character ligatures based on an open and closed cavity character representation.
Abstract: Recognition of Old Greek Early Christian manuscripts is essential for efficient content exploitation of the valuable Old Greek Early Christian historical collections. In this paper, we focus on the problem of recognizing Old Greek manuscripts and propose a novel recognition technique that has been tested in a large number of important historical manuscript collections which are written in lowercase letters and originate from St. Catherine’s Mount Sinai Monastery. Based on an open and closed cavity character representation, we propose a novel, segmentation-free, fast and efficient technique for the detection and recognition of characters and character ligatures. First, we detect open and closed cavities that exist in the skeletonized character body. Then, the classification of a specific character or character ligature is based on the protrusible segments that appear in the topological description of the character skeletons. Experimental results prove the efficiency of the proposed approach.

33 citations

Proceedings ArticleDOI
Xiaofan Lin1
20 Jan 2003
TL;DR: A robust algorithm to extract headers and footers from a variety of electronic documents, such as image files, Adobe PDF files, and files generated from OCR, with fuzzy string match makes the method robust against OCR errors.
Abstract: This paper introduces a robust algorithm to extract headers and footers from a variety of electronic documents, such as image files, Adobe PDF files, and files generated from OCR. Compared with the conventional methods based on the page-level layout and format, the proposed strategy considers a page in the context of neighboring pages. Through the page-association, the headers and footers in different patterns can be automatically detected without human interference or individual templates. In addition, fuzzy string match makes the method robust against OCR errors.

33 citations

Proceedings ArticleDOI
23 Sep 2007
TL;DR: The method is model-driven and is intended to annotate large collection of documents, scanned in three different resolutions, at character level, and employs an XML representation for storage of the annotation information.
Abstract: A large annotated corpus is critical to the development of robust optical character recognizers (OCRs). However, creation of annotated corpora is a tedious task. It is laborious, especially when the annotation is at the character level. In this paper, we propose an efficient hierarchical approach for annotation of large collection of printed document images. We align document images with independently keyed-in text. The method is model-driven and is intended to annotate large collection of documents, scanned in three different resolutions, at character level. We employ an XML representation for storage of the annotation information. APIs are provided for access at content level for easy use in training and evaluation of OCRs and other document understanding tasks.

33 citations

Proceedings ArticleDOI
14 Aug 1995
TL;DR: This contribution considers the construction of hyperdocuments; converting scanned paper documents into electronic hypertext, with a focus on hyperlinks between the text and labels in a picture.
Abstract: In this contribution we consider the construction of hyperdocuments; converting scanned paper documents into electronic hypertext. Hyperlink creation is automated by analyzing the structure and content of the scanned document. The focus is on hyperlinks between the text and labels in a picture. A number of tools for such hyperlink detection are described. Practical results are presented.

33 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
87% related
Feature (computer vision)
128.2K papers, 1.7M citations
85% related
Image segmentation
79.6K papers, 1.8M citations
85% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023186
2022425
2021333
2020448
2019430
2018357