Home
/
Topics
/
Optical character recognition

Topic

Optical character recognition

About: Optical character recognition is a research topic. Over the lifetime, 7342 publications have been published within this topic receiving 158193 citations. The topic is also known as: OCR & optical character reader.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A Standalone OCR System for Mobile Cameraphones

[...]

Mikael Laine¹, Olli Nevalainen¹•Institutions (1)

Turku Centre for Computer Science¹

11 Dec 2006

TL;DR: This article describes a simple OCR system that was implemented in Symbian C++ to be run on a stock Nokia 6630 cameraphone and is limited to recognizing English capital letters printed in black, against a white background.

...read moreread less

Abstract: In optical character recognition (OCR), visible characters appearing as images (i.e. on paper) are recognized as symbolic characters and stored in a computer?s memory or similar device. The purpose of this work is to find whether current mobile cameraphones are able to run OCR software without relying on dedicated hardware or facilities provided by the network. This article describes a simple OCR system that was implemented in Symbian C++ to be run on a stock Nokia 6630 camera-phone. The system is limited to recognizing English capital letters printed in black, against a white background. The opportunities and hardships related to bringing OCR to run on mobile platforms having image capturing capability are also discussed in more general terms.

...read moreread less

40 citations

Proceedings Article•DOI•

Optical character recognition errors and their effects on natural language processing

[...]

Daniel P. Lopresti¹•Institutions (1)

Lehigh University¹

24 Jul 2008

TL;DR: A new paradigm is proposed for measuring the impact of recognition errors on the stages of a standard text analysis pipeline: sentence boundary detection, tokenization, and part-of-speech tagging, which formulates error classification as an optimization problem solvable using a hierarchical dynamic programming approach.

...read moreread less

Abstract: Errors are unavoidable in advanced computer vision applications such as optical character recognition, and the noise induced by these errors presents a serious challenge to down-stream processes that attempt to make use of such data. In this paper, we apply a new paradigm we have proposed for measuring the impact of recognition errors on the stages of a standard text analysis pipeline: sentence boundary detection, tokenization, and part-of-speech tagging. Our methodology formulates error classification as an optimization problem solvable using a hierarchical dynamic programming approach. Errors and their cascading effects are isolated and analyzed as they travel through the pipeline. We present experimental results based on a large collection of scanned pages to study the varying impact depending on the nature of the error and the character(s) involved. The problem of identifying tabular structures that should not be parsed as sentential text is also discussed.

...read moreread less

40 citations

Proceedings Article•DOI•

Robust Recognition of Degraded Documents Using Character N-Grams

[...]

Shrey Dutta¹, Naveen Sankaran¹, K. Pramod Sankar², C. V. Jawahar¹•Institutions (2)

International Institute of Information Technology, Hyderabad¹, Xerox²

27 Mar 2012

TL;DR: A novel recognition approach that results in a 15% decrease in word error rate on heavily degraded Indian language document images by exploiting the additional context present in the character n-gram images, which enables better disambiguation between confusing characters in the recognition phase.

...read moreread less

Abstract: In this paper we present a novel recognition approach that results in a 15% decrease in word error rate on heavily degraded Indian language document images. OCRs have considerably good performance on good quality documents, but fail easily in presence of degradations. Also, classical OCR approaches perform poorly over complex scripts such as those for Indian languages. We address these issues by proposing to recognize character n-gram images, which are basically groupings of consecutive character/component segments. Our approach is unique, since we use the character n-grams as a primitive for recognition rather than for post processing. By exploiting the additional context present in the character n-gram images, we enable better disambiguation between confusing characters in the recognition phase. The labels obtained from recognizing the constituent n-grams are then fused to obtain a label for the word that emitted them. Our method is inherently robust to degradations such as cuts and merges which are common in digital libraries of scanned documents. We also present a reliable and scalable scheme for recognizing character n-gram images. Tests on English and Malayalam document images show considerable improvement in recognition in the case of heavily degraded documents.

...read moreread less

40 citations

Journal Article•DOI•

A multiple feature/resolution scheme to Arabic (Indian) numerals recognition using hidden Markov models

[...]

Sameh M. Awaidah¹, Sabri A. Mahmoud¹•Institutions (1)

King Fahd University of Petroleum and Minerals¹

01 Jun 2009-Signal Processing

TL;DR: The presented technique, which is writer independent, proved to be effective in the automatic recognition of Arabic (Indian) numerals in terms of the highest recognition rate possible.

...read moreread less

40 citations

Proceedings Article•DOI•

LBP Based Line-Wise Script Identification

[...]

Miguel Ferrer, Aythami Morales, Umapada Pal¹•Institutions (1)

Indian Statistical Institute¹

25 Aug 2013

TL;DR: This paper proposed a new algorithm for printed script identification based on texture analysis that uses the histogram of the local patterns as description of the script stroke directions distribution which is the characteristic of every script.

...read moreread less

Abstract: Script identification is an important step in multi-script document analysis. As different textures present in text portion of a script are the main distinct features of the script, in this paper, we proposed a new algorithm for printed script identification based on texture analysis. Since local patterns is a unifying concept for traditional statistical and structural approaches of texture analysis, here the basic idea is to use the histogram of the local patterns as description of the script stroke directions distribution which is the characteristic of every script. As local pattern, the basic version of the Local Binary Patterns (LBP) and a modified version of the Orientation of the Local Binary Patterns (OLBP) are proposed. A Least Square Support Vector Machine (LS-SVM) is used as identifier. The scheme has been verified on two databases. The first or training database is a database with 200 sheets of 10 different scripts. The scripts font is provided by the Google translator. The second or test database has been obtained by scanning different newspapers and books. It contains 5 common scripts among 10 different scripts of the first database. From the experiment we obtained encouraging results.

...read moreread less

40 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
…
137
138
139
140
141
142
143
…
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

7,941

Papers

180,323

Citations

No. of papers in the topic in previous years
Year	Papers
2023	186
2022	425
2021	333
2020	448
2019	430
2018	357

Optical character recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics