Topic

Optical character recognition

About: Optical character recognition is a research topic. Over the lifetime, 7342 publications have been published within this topic receiving 158193 citations. The topic is also known as: OCR & optical character reader.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training

[...]

Qiang Huo¹, Yong Ge, Zhi-Dan Feng•Institutions (1)

University of Hong Kong¹

07 May 2001

TL;DR: Three key techniques contributing to the high recognition accuracy are highlighted, namely theuse of Gabor features, the use of discriminative feature extraction, and theUse of minimum classification error as a criterion for model training.

...read moreread less

Abstract: We have developed a Chinese OCR engine for machine printed documents. Currently, our OCR engine can support a vocabulary of 6921 characters which include 6707 simplified Chinese characters in GB2312-80, 12 frequently used GBK Chinese characters, 62 alphanumeric characters, 140 punctuation marks and symbols. The supported font styles include Song, Fang Song, Kat, He, Yuan, LiShu, WeiBei, XingKai, etc. The averaged character recognition accuracy is above 99% for newspaper quality documents with a recognition speed of about 250 characters per second on a Pentium III-450 MHz PC yet only consuming less than 2 MB memory. We describe the key technologies we used to construct the above recognizer. Among them, we highlight three key techniques contributing to the high recognition accuracy, namely the use of Gabor features, the use of discriminative feature extraction, and the use of minimum classification error as a criterion for model training.

...read moreread less

72 citations

Journal Article•DOI•

SCUT-COUCH2009—a comprehensive online unconstrained Chinese handwriting database and benchmark evaluation

[...]

Lianwen Jin¹, Yan Gao¹, Gang Liu¹, Yunyang Li¹, Kai Ding¹ - Show less +1 more•Institutions (1)

South China University of Technology¹

01 Mar 2011-International Journal on Document Analysis and Recognition

TL;DR: The SCUT-COUCH2009 database is the first publicly available large vocabulary online Chinese handwriting database containing multi-type character/word samples and some evaluation results on the database are reported using state-of-the-art recognizers for benchmarking.

...read moreread less

Abstract: A comprehensive online unconstrained Chinese handwriting dataset, SCUT-COUCH2009, is introduced in this paper. As a revision of SCUT-COUCH2008 [1], the SCUT-COUCH2009 database consists of more datasets with larger vocabularies and more writers. The database is built to facilitate the research of unconstrained online Chinese handwriting recognition. It is comprehensive in the sense that it consists of 11 datasets of different vocabularies, named GB1, GB2, TradGB1, Big5, Pinyin, Letters, Digit, Symbol, Word8888, Word17366 and Word44208. In particular, the SCUT-COUCH2009 database contains handwritten samples of 6,763 single Chinese characters in the GB2312-80 standard, 5,401 traditional Chinese characters of the Big5 standard, 1,384 traditional Chinese characters corresponding to the level 1 characters of the GB2312-80 standard, 8,888 frequently used Chinese words, 17,366 daily-used Chinese words, 44,208 complete words from the Fourth Edition of “The Contemporary Chinese Dictionary”, 2,010 Pinyin and 184 daily-used symbols. The samples were collected using PDAs (Personal Digit Assistant) and smart phones with touch screens and were contributed by more than 190 persons. The total number of character samples is over 3.6 million. The SCUT-COUCH2009 database is the first publicly available large vocabulary online Chinese handwriting database containing multi-type character/word samples. We report some evaluation results on the database using state-of-the-art recognizers for benchmarking.

...read moreread less

72 citations

Journal Article•DOI•

Direct extraction of topographic features for gray scale character recognition

[...]

Seong-Whan Lee¹, Young Joon Kim•Institutions (1)

Korea University¹

01 Jul 1995-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper proposes a new method for the direct extraction of topographic features from gray scale character images by computing the directions of principal curvature efficiently and prevented the extraction of unnecessary features.

...read moreread less

Abstract: Optical character recognition (OCR) traditionally applies to binary-valued imagery although text is always scanned and stored in gray scale. However, binarization of multivalued image may remove important topological information from characters and introduce noise to character background. In order to avoid this problem, it is indispensable to develop a method which can minimize the information loss due to binarization by extracting features directly from gray scale character images. In this paper, we propose a new method for the direct extraction of topographic features from gray scale character images. By comparing the proposed method with Wang and Pavlidis' method, we realized that the proposed method enhanced the performance of topographic feature extraction by computing the directions of principal curvature efficiently and prevented the extraction of unnecessary features. We also show that the proposed method is very effective for gray scale skeletonization compared to Levi and Montanari's method. >

...read moreread less

72 citations

Proceedings Article•

A Comprehensive Isolated Farsi/Arabic Character Database for Handwritten OCR Research

[...]

Saeed Mozaffari, Karim Faez, Farhad Faradji, Majid Ziaratban, S. Mohamad Golzan - Show less +1 more

23 Oct 2006

TL;DR: A new comprehensive database for isolated offline handwritten Farsi/Arabic numbers and characters for use in optical character recognition research and is freely available for academic use.

...read moreread less

Abstract: This paper presents a new comprehensive database for isolated offline handwritten Farsi/Arabic numbers and characters for use in optical character recognition research. The database is freely available for academic use. So far no such a freely database in Farsi language is available. Grayscale images of 52,380 characters and 17,740 numerals are included. Each image was scanned from Iranian school entrance exam forms during the years 2004-2006 at 300 dpi. The only restriction imposed on the writers is to write each character within a rectangular box. The number of samples in each class of the database is non-uniform corresponding to their real life distributions. Also, for comparison purposes, each dataset has been properly divided into respective training and test sets.

...read moreread less

72 citations

Proceedings Article•DOI•

Line detection and segmentation in historical church registers

[...]

M. Feldbach¹, Klaus D. Tönnies¹•Institutions (1)

Otto-von-Guericke University Magdeburg¹

10 Sep 2001

TL;DR: Algorithm for transforming the paper documents into a representation of text apt to be used as input for an automatic text recognizer and line segmentation was found to be successful in 97% of all samples.

...read moreread less

Abstract: For being able to automatically acquire the information recorded in church registers and other historical scriptures, the writing on these documents has to be recognized. This paper describes algorithms for transforming the paper documents into a representation of text apt to be used as input for an automatic text recognizer. The automatic recognition of old handwritten scriptures is difficult for two main reasons. Lines of text in general are not straight and ascenders and descenders of adjacent lines interfere. The algorithms described in this paper provide ways to reconstruct the path of the lines of text using an approach of gradually constructing line segments until a unique line of text is formed. In addition, the single lines are segmented and an output in form of a raster image is provided. The method was applied to church registers. They were written between the 17th and 19th Century. Line segmentation was found to be successful in 97% of all samples.

...read moreread less

72 citations

Collapse

Network Information

Performance

Metrics

7,941

Papers

180,323

Citations

No. of papers in the topic in previous years
Year	Papers
2023	186
2022	425
2021	333
2020	448
2019	430
2018	357

Optical character recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics