Home
/
Topics
/
Optical character recognition

Topic

Optical character recognition

About: Optical character recognition is a research topic. Over the lifetime, 7342 publications have been published within this topic receiving 158193 citations. The topic is also known as: OCR & optical character reader.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969

1 / 2

Papers

PDF

Open Access

More filters

Text Retrieval from Document Images based on N-Gram Algorithm.

[...]

Chew Lim Tan, Sam Yuan Sung, Zhaohui Yu, Yi Xu

01 Jan 2000

TL;DR: A method of text retrieval from document images using a similarity measure based on an N-Gram algorithm to directly extract image features instead of using optical character recognition.

...read moreread less

Abstract: In this paper, we propose a method of text retrieval from document images using a similarity measure based on an N-Gram algorithm We directly extract image features instead of using optical character recognition Character image objects are extracted from document images based on connected components first and then an unsupervised classifier is used to classify these objects All objects are encoded according to one unified class set and each document image is represented by one stream of object codes Next, we retrieve N-Gram slices from these streams and build document vectors Lastly, we obtain the pair-wise similarity of document images by means of the scalar product of the document vectors Four copora of news articles were used to test the validity of our method During the test, the similarity of document images using this method was compared with the result of ASCII version of those documents based on the N-Gram algorithm for text documents

...read moreread less

36 citations

Book Chapter•DOI•

Robust argumentative zoning for sensemaking in scholarly documents

[...]

Simone Teufel¹, Min-Yen Kan²•Institutions (2)

University of Cambridge¹, National University of Singapore²

15 Jun 2009

TL;DR: An automated approach to classify sentences of scholarly work with respect to their rhetorical function is presented, which is robust to noise and can process raw text.

...read moreread less

Abstract: We present an automated approach to classify sentences of scholarly work with respect to their rhetorical function. While previous work that achieves this task of argumentative zoning requires richly annotated input, our approach is robust to noise and can process raw text. Even in cases where the input has noise (as it is obtained from optical character recognition or text extraction from PDF files), our robust classifier is largely accurate. We perform an in-depth study of our system both with clean and noisy inputs. We also give preliminary results from in situ acceptability testing when the classifier is embedded within a digital library reading environment.

...read moreread less

36 citations

Journal Article•DOI•

Multi-level image thresholding based on Kapur and Tsallis entropy using firefly algorithm

[...]

Abhay Sharma¹, Rekha Chaturvedi¹, Sandeep Kumar¹, Umesh Kumar Dwivedi¹•Institutions (1)

Amity University¹

12 May 2020-Journal of Interdisciplinary Mathematics

TL;DR: This work introduced entropy-based thresholding with metaheuristic approach to find optimal threshold for gray images and found Tsallis method offer better PSNR and SSIM values and capable of effective segmentation of images.

...read moreread less

Abstract: Image segmentation is necessity of many application like brain tumor detection, optical character recognition, thermal energy leakage detection, Face recognition etc. multilevel thresholding is the...

...read moreread less

36 citations

Patent•

Identifying Matching Canonical Documents Consistent with Visual Query Structural Information

[...]

David Petrou¹, Ashok C. Popat¹, Matthew R. Casey¹•Institutions (1)

Google¹

01 Dec 2011

TL;DR: In this article, a server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the query.

...read moreread less

Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.

...read moreread less

36 citations

Proceedings Article•DOI•

An object attribute thresholding algorithm for document image binarization

[...]

Y. Liu¹, R. Feinrich, Sargur N. Srihari•Institutions (1)

State University of New York System¹

20 Oct 1993

TL;DR: Preliminary results on a new approach to document image binarization, an algorithm based on gray scale histogram and run-length histogram analysis, show that over 99% of such address blocks can be correctly binarized.

...read moreread less

Abstract: Document image binarization is not a completely solved problem for unconstrained document images. Binarization algorithms, whether global or local, can easily fail on images with noisy or complex background, or poor contrast. The authors report preliminary results on a new approach to document image binarization, an algorithm based on gray scale histogram and run-length histogram analysis. Experimental results on unconstrained machine printed address blocks from the US letter mail stream show that over 99% of such address blocks can be correctly binarized. >

...read moreread less

36 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
…
154
155
156
157
158
159
160
…
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

7,941

Papers

180,323

Citations

No. of papers in the topic in previous years
Year	Papers
2023	186
2022	425
2021	333
2020	448
2019	430
2018	357

Optical character recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics