Topic

Optical character recognition

About: Optical character recognition is a research topic. Over the lifetime, 7342 publications have been published within this topic receiving 158193 citations. The topic is also known as: OCR & optical character reader.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A connectionist recognizer for on-line cursive handwriting recognition

[...]

S. Manke¹, U. Bodenhausen¹•Institutions (1)

Karlsruhe Institute of Technology¹

19 Apr 1994

TL;DR: The MS-TDNN integrates the high accuracy single character recognition capabilities of a TDNN with a non-linear time alignment procedure (dynamic time warping algorithm) for finding stroke and character boundaries in isolated, handwritten characters and words.

...read moreread less

Abstract: Shows how the multi-state time delay neural network (MS-TDNN), which is already used successfully in continuous speech recognition tasks, can be applied both to online single character and cursive (continuous) handwriting recognition. The MS-TDNN integrates the high accuracy single character recognition capabilities of a TDNN with a non-linear time alignment procedure (dynamic time warping algorithm) for finding stroke and character boundaries in isolated, handwritten characters and words. In this approach each character is modelled by up to 3 different states and words are represented as a sequence of these characters. The authors describe the basic MS-TDNN architecture and the input features used in the paper, and present results (up to 97.7% word recognition rate) both on writer dependent/independent, single character recognition tasks and writer dependent, cursive handwriting tasks with varying vocabulary sizes up to 20000 words. >

...read moreread less

37 citations

Proceedings Article•DOI•

Customised OCR correction for historical medical text

[...]

Paul Thompson¹, John McNaught¹, Sophia Ananiadou¹•Institutions (1)

University of Manchester¹

01 Sep 2015

TL;DR: A new OCR correction strategy, customised for historical medical documents, which combines rule-based correction of regular errors with a medically-tuned spell-checking strategy, whose corrections are guided by information about subject-specific language usage from the publication period of the article to be corrected.

...read moreread less

Abstract: Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, owing to large-scale digitisation efforts. Searchable access is typically provided by applying Optical Character Recognition (OCR) software to scanned page images. Often, however, the automatically recognised text contains a large number of errors, since OCR systems are typically optimised to deal with modern documents, and can struggle with historical document features, including variable print characteristics and archaic vocabulary usage. Low quality OCR text can reduce the efficiency of search systems over historical archives, particularly semantic systems that are based on the application of sophisticated text mining (TM) techniques. We report on a new OCR correction strategy, customised for historical medical documents. The method combines rule-based correction of regular errors with a medically-tuned spell-checking strategy, whose corrections are guided by information about subject-specific language usage from the publication period of the article to be corrected. The performance of our method compares favourably to other OCR post-correction strategies, in improving word-level accuracy of poor-quality documents by up to 16%.

...read moreread less

37 citations

Proceedings Article•DOI•

Neural network based handwritten numeral recognition of Kannada and Telugu scripts

[...]

S.V. Rajashekararadhya¹, Prashant Ranjan¹•Institutions (1)

Anna University¹

01 Nov 2008

TL;DR: Zone and Distance metric based feature extraction system is presented and 98 % and 96 % recognition rate for Kannada and Telugu numerals respectively are obtained.

...read moreread less

Abstract: Character recognition is the important area in image processing and pattern recognition fields. Handwritten character recognition has received extensive attention in academic and production fields. The recognition system can be either on-line or off-line. Off-line handwriting recognition is the subfield of optical character recognition. India is a multi-lingual and multi-script country, where eighteen official scripts are accepted and have over hundred regional languages. In this paper we present Zone and Distance metric based feature extraction system. The character centroid is computed and the image is further divided in to n equal zones. Average distance from the character centroid to the each pixel present in the zone is computed. This procedure is repeated for all the zones present in the numeral image. Finally n such features are extracted for classification and recognition. Feed forward back propagation neural network is designed for subsequent classification and recognition purpose. We obtained 98 % and 96 % recognition rate for Kannada and Telugu numerals respectively.

...read moreread less

37 citations

Proceedings Article•DOI•

A Fuzzy Technique for Segmentation of Handwritten Bangla Word Images

[...]

Subhadip Basu¹, Ram Sarkar², Nibaran Das¹, Mahantapas Kundu¹, Mita Nasipuri¹, Dipak Kumar Basu¹ - Show less +2 more•Institutions (2)

Jadavpur University¹, MCKV Institute of Engineering²

05 Mar 2007

TL;DR: A fuzzy technique for segmentation of handwritten Bangla word images is presented and can be considered as a significant step towards the development of a full-fledged Bangla OCR system, especially for handwritten documents.

...read moreread less

Abstract: A fuzzy technique for segmentation of handwritten Bangla word images is presented. It works in two steps. In first step, the black pixels constituting the Matra (i.e., the longest horizontal line joining the tops of individual characters of a Bangla word) in the target word image is identified by using a fuzzy feature. In second step, some of the black pixels on the Matra are identified as segment points (i.e., the points through which the word is to be segmented) by using three fuzzy features. On experimentation with a set of 210 samples of handwritten Bangla words, collected from different sources, the average success rate of the technique is shown to be 95.32%. Apart from certain limitations, the technique can be considered as a significant step towards the development of a full-fledged Bangla OCR system, especially for handwritten documents

...read moreread less

37 citations

Journal Article•DOI•

Word Segmentation Method for Handwritten Documents based on Structured Learning

[...]

Jewoong Ryu¹, Hyung Il Koo², Nam Ik Cho¹•Institutions (2)

Seoul National University¹, Ajou University²

08 Jan 2015-IEEE Signal Processing Letters

TL;DR: This work forms the word segmentation problem as a binary quadratic assignment problem that considers pairwise correlations between the gaps as well as the likelihoods of individual gaps, and estimates all parameters based on the Structured SVM framework so that the proposed method works well regardless of writing styles and written languages without user-defined parameters.

...read moreread less

Abstract: Segmentation of handwritten document images into text-lines and words is an essential task for optical character recognition. However, since the features of handwritten document are irregular and diverse depending on the person, it is considered a challenging problem. In order to address the problem, we formulate the word segmentation problem as a binary quadratic assignment problem that considers pairwise correlations between the gaps as well as the likelihoods of individual gaps. Even though many parameters are involved in our formulation, we estimate all parameters based on the Structured SVM framework so that the proposed method works well regardless of writing styles and written languages without user-defined parameters. Experimental results on ICDAR 2009/2013 handwriting segmentation databases show that proposed method achieves the state-of-the-art performance on Latin-based and Indian languages.

...read moreread less

37 citations

Collapse

Network Information

Performance

Metrics

7,941

Papers

180,323

Citations

No. of papers in the topic in previous years
Year	Papers
2023	186
2022	425
2021	333
2020	448
2019	430
2018	357

Optical character recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics