PhotoOCR: Reading Text in Uncontrolled Conditions
read more
Citations
An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition
Speeding up Convolutional Neural Networks with Low Rank Expansions
Reading Text in the Wild with Convolutional Neural Networks
Arbitrary-Oriented Scene Text Detection via Rotation Proposals
Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition
References
Gradient-based learning applied to document recognition
Histograms of oriented gradients for human detection
Rapid object detection using a boosted cascade of simple features
Artificial Intelligence: A Modern Approach
Rectified Linear Units Improve Restricted Boltzmann Machines
Related Papers (5)
Frequently Asked Questions (11)
Q2. What is the first method used to extract text from a binary image?
The input image is binarized using Niblack binarization [19], a morphological opening operation is applied, and connected components are extracted from the resulting binary image.
Q3. How do the authors train their neural network character classifier?
The authors train their neural network character classifier using stochastic gradient descent with Adagrad [7] and dropout [10], using the distributed training design described in [6].
Q4. What is the method for detecting text from uncontrolled images?
They typically rely on brittle techniques such as binarization, where the first stage of processing is a simple thresholding operation used to divide an image into text and non-text pixels [19].
Q5. How many tokens are trained on the deep neural network?
In particular, their deep neural network character classifier is trained on up to 2 million manually labelled examples, and their language model is learned on a corpus of more than a trillion tokens.
Q6. How much does the larger beam improve the recall?
The larger beam width improves recall by only 0.5%, suggesting that approximate inference is not an important limit on performance.
Q7. How many languages can be recognized simultaneously?
A single worker is designed to recognize multiple languages simultaneously; at present the authors support 29 languages with Latin script.
Q8. How does the system perform on the major text recognition benchmarks?
The system achieves record performance on all major text recognition benchmarks, and high quality text extraction from typical smartphone imagery with sub-second latency.
Q9. What is the classifier probability for the ith segment?
c is vector of class assignments, the ith segment being assigned label ci. Ψ(ci, bi, bi+1) is the classifier probability for class assignment ci to the pixels between bi and bi+1.
Q10. What is the way to correct the character labels?
The extracted character bounding boxes come from their OCR system, but any errors in the character labels are corrected by alignment against the source text.
Q11. How much of the word error rate is reduced by adding the character level model?
Adding the word level model gives a further 4% word er-Training Set SizeCharacter ClassifierAccuracy (%)Word Recognition Rate (%)1.1× 107 (*) 92.18 70.99 3.9× 106 91.79 70.47 1.9× 106 90.98 68.83 9.7× 105 90.60 69.20 4.9× 105 89.19 65.77 2.5× 105 88.38 60.50 1.2× 105 86.74 53.20 6.3× 104 85.21 46.74ror rate reduction.