Detecting and reading text in natural scenes
read more
Citations
Detecting text in natural scenes with stroke width transform
End-to-end scene text recognition
Reading Text in the Wild with Convolutional Neural Networks
Arbitrary-Oriented Scene Text Detection via Rotation Proposals
End-to-end text recognition with convolutional neural networks
References
Eigenfaces for recognition
Experiments with a new boosting algorithm
Additive Logistic Regression : A Statistical View of Boosting
Neural network-based face detection
Related Papers (5)
Frequently Asked Questions (14)
Q2. What is the important component of the algorithm?
The first, and most important, component of the algorithm is a strong classifier which is trained by the AdaBoostlearning algorithm [4],[19],[20] on labelled data.
Q3. What methods were used to learn the strong classifier?
The authors used standard AdaBoost training methods to learn the strong classifier [4] [5] combined with Viola and Jones’ cascade approach which uses asymmetric weighting [19].
Q4. How many false positive text regions were detected by AdaBoost?
the AdaBoost strong classifier (plus extension/binarization) detected 97.2 % of the visible text in their test dataset (text that could be detected by a normally sighted viewer).
Q5. What is the way to classify text?
In ideal text images, the authors would be able to classify pixels as text or background directly from the intensity histogram which should have two peaks corresponding to text and background mean intensity.
Q6. What is the advantage of the AdaBoost algorithm?
These algorithms have the additional advantage that they use generative models [18] and can be applied directly to the image intensity without requiring binarization.
Q7. What is the common error in the text?
The most common remaining error are text string like ”111” or ”Ill” which correspond to vertical edges in the image caused, for example, by iron railings.
Q8. What are the properties with low entropy?
These are also properties with low entropy, since there will typically be a fixed number of long edges whatever the letters in the text region.
Q9. What are the three layers of the cascade onlyuse mean, STD and module of derivative?
The first three layers of the cascade onlyuse mean, STD and module of derivative features, since they can be easily calculated from integral images[19].
Q10. What is the simplest way to determine the threshold?
Niblack’s algorithm requires adaptively determining a threshold T for each pixel x from the intensity statistics within a local window of size r Tr(x) = µr(x) + k · σr(x), where µr(x) and σr(x) are the mean and standard deviation (std) of the pixel intensities within the window.
Q11. What was the way to classify windows in the training images?
After training with these samples, the authors applied the AdaBoost algorithm to classify all windows in the training images (at a range of sizes).
Q12. What is the third component of the OCR algorithm?
The third component is an OCR software program which acts on the binarized regions (the OCR software gave far worse performance when applied directly to the image).
Q13. What is the result of this feature selection approach?
The result of this feature selection approach is that their final strong classifier, see next section, uses far fewer filter’s than Viola and Jones’ face detection classifier [19].
Q14. How many text segments were used in the training dataset?
The authors performed this labelling for the training dataset and and divided each text window into several overlapping text segments with fixed width-to-height ratio 2:1.