scispace - formally typeset
Search or ask a question
Topic

Optical character recognition

About: Optical character recognition is a research topic. Over the lifetime, 7342 publications have been published within this topic receiving 158193 citations. The topic is also known as: OCR & optical character reader.


Papers
More filters
Journal ArticleDOI
TL;DR: The statistics show that the HIT-MW database has an excellent representation of the real handwriting and many new applications concerning real handwriting recognition can be supported by the database.
Abstract: A Chinese handwriting database named HIT-MW is presented to facilitate the offline Chinese handwritten text recognition. Both the writers and the texts for handcopying are carefully sampled with a systematic scheme. To collect naturally written handwriting, forms are distributed by postal mail or middleman instead of face to face. The current version of HIT-MW includes 853 forms and 186,444 characters that are produced under an unconstrained condition without preprinted character boxes. The statistics show that the database has an excellent representation of the real handwriting. Many new applications concerning real handwriting recognition can be supported by the database.

142 citations

Proceedings ArticleDOI
18 Aug 1997
TL;DR: The architecture of a system for reading machine-printed documents in known predefined tabular-data layout styles, and algorithms for identifying and segmenting records with known layout, and integration of these algorithms with a graphical user interface (GUI) for defining new layouts are described.
Abstract: We describe the architecture of a system for reading machine-printed documents in known predefined tabular-data layout styles. In these tables, textual data are presented in record lines made up of fixed-width fields. Tables often do not rely on line-art (ruled lines) to delimit fields, and in this way differ crucially from fixed forms. Our system performs these steps: copes with multiple tables per page; identifies records within tables; segments records into fields; and recognizes characters within fields, constrained by field-specific contextual knowledge. Obstacles to good performance on tables include small print, tight line-spacing, poor-quality text (such as photocopies), and line-art or background patterns that touch the text. Precise skew-correction and pitch-estimation, and high-performance OCR using neural nets proved crucial in overcoming these obstacles. The most significant technical advances in this work appear to be algorithms for identifying and segmenting records with known layout, and integration of these algorithms with a graphical user interface (GUI) for defining new layouts. This GUI has been ergonomically designed to make efficient and intuitive use of exemplary images, so that the skill and manual effort required to retarget the system to new table layouts are held to a minimum. The system has been applied in this way to more than 400 distinct tabular layouts. During the last three years the system has read over fifty million records with high accuracy.

142 citations

Patent
28 Aug 1991
TL;DR: In this paper, a document containing highlighted regions is scanned using a gray scale scanner and Morphology and threshold reduction techniques are used to separate highlighted and non-highlighted portions of the document.
Abstract: A method and apparatus for detection of highlighted regions of a document. A document containing highlighted regions is scanned using a gray scale scanner. Morphology and threshold reduction techniques are used to separate highlighted and non-highlighted portions of the document. Having separated the highlighted and non-highlighted portions, optical character recognition (OCR) techniques can then be used to extract text from the highlighted regions.

142 citations

Journal ArticleDOI
TL;DR: It is shown that the image representation, vectorization techniques, and optical character recognition subsystem are quite general and that the methodology implemented in the system can be generalized to the acquisition of other classes of line drawings.
Abstract: A system for the automatic acquisition of land register maps is described. The system converts paper-based documents into digital form for integration into an existing database. Processing a map begins with its digitization by a scanning device. A key step in the system is the conversion from raster format to graph representation, a special binary image format suitable for processing line structures. Subsequent steps include vectorization of the line structures and recognition of the symbols interspersed in the drawing. The system requires operator interaction only to resolve ambiguities and correct errors in the automatic processes. The final result is a set of descriptors for all detected map entities, which is then stored in a database. It is shown that the image representation, vectorization techniques, and optical character recognition subsystem are quite general and that the methodology implemented in the system can be generalized to the acquisition of other classes of line drawings. >

140 citations

Journal ArticleDOI
TL;DR: This review article serves the purpose of presenting state of the art results and techniques on OCR and also provide research directions by highlighting research gaps.
Abstract: Given the ubiquity of handwritten documents in human transactions, Optical Character Recognition (OCR) of documents have invaluable practical worth. Optical character recognition is a science that enables to translate various types of documents or images into analyzable, editable and searchable data. During last decade, researchers have used artificial intelligence/machine learning tools to automatically analyze handwritten and printed documents in order to convert them into electronic format. The objective of this review paper is to summarize research that has been conducted on character recognition of handwritten documents and to provide research directions. In this Systematic Literature Review (SLR) we collected, synthesized and analyzed research articles on the topic of handwritten OCR (and closely related topics) which were published between year 2000 to 2019. We followed widely used electronic databases by following pre-defined review protocol. Articles were searched using keywords, forward reference searching and backward reference searching in order to search all the articles related to the topic. After carefully following study selection process 176 articles were selected for this SLR. This review article serves the purpose of presenting state of the art results and techniques on OCR and also provide research directions by highlighting research gaps.

139 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
87% related
Feature (computer vision)
128.2K papers, 1.7M citations
85% related
Image segmentation
79.6K papers, 1.8M citations
85% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023186
2022425
2021333
2020448
2019430
2018357