Layout Analysis for Arabic Historical Document Images Using Machine Learning
read more
Citations
Page segmentation of historical document images with convolutional autoencoders
DocBank: A Benchmark Dataset for Document Layout Analysis
Document Layout Analysis: A Comprehensive Survey
Convolutional Neural Networks for Page Segmentation of Historical Document Images
DocBank: A Benchmark Dataset for Document Layout Analysis
References
Texture Analysis Methods - A Review
Text line segmentation of historical documents: a survey
Segmentation of Page Images Using the Area Voronoi Diagram
Document structure analysis algorithms: a literature survey
Page segmentation using texture analysis
Related Papers (5)
Frequently Asked Questions (17)
Q2. What have the authors stated for future works in "Layout analysis for arabic historical document images using machine learning" ?
Their future work will focus on improving some aspects of the algorithm.
Q3. Why did layout analysis of ancient manuscripts become a challenging problem?
Due to looser formatting rules, non-rectangular layout and irregularities in location of layout entities [2, 11], layout analysis of handwritten ancient documents became a challenging research problem.
Q4. How did the authors improve the segmentation results?
The authors improve the segmentation results applying nearest neighbor analysis and using class probabilities for refining the class label of each connected component.
Q5. How do the authors measure the segmentation accuracy of the document images?
The authors evaluate the segmentation accuracy by adopting the F-measure met-ric which combines precision and recall values into a single scalar representative.
Q6. What is the neighborhood of the document?
The considered neighborhood is calculated adaptively as a function of component’s width and height (denoted by w and h respectively), and is wfactort × w by hfactor × h, where wfactor is always greater than hfactor because of the horizontal nature of Arabic script.
Q7. What is the size of the context-based feature vector?
To generate context-basedfeature vector, each connected component with its surrounding context area is rescaled to a 64 × 64 window size, while the connected component is kept at the center of the window.
Q8. How do the authors improve the segmentation results?
In order to improve the segmentation results the authors use a post-processing step based on relaxation labeling approach which is described below.
Q9. What is the method for analyzing side-notes?
Conventional methods for geometric layout analysis could be an adequate choice to tackle the side-notes segmentation problem when main-body and side-note text have salient and differentiable geometric properties, such as: text orientation, text size, white space locations, etc.
Q10. Why is the proposed method not applicable?
Due to the challenges in handwritten historical documents [2], applying traditional page layout analysis methods, which usually address machine-printed documents, is not applicable.
Q11. What is the importance of generalization when training a model?
It is widely known that generalization is a critical issue when training a model, namely, generating a model that has the ability to predict reliably the suitable class of a given sample that does not appear in the training set.
Q12. How did the authors generate the ground truth?
Pixel-level ground truth has been generated by manually assigning text in the documents of the testing set with one of the two classes, main-body or side-notes text.
Q13. What is the effect of the projection profile on the robustness of the approach?
One can also notice that the robustness of this approach could be negatively affected once side-notes text have the same orientation as main-body text and the two types of text have no salient space between them.
Q14. What is the main problem with classifier tuning?
In general, classifier tuning is a hard problem with respect to the optimization of their sensitive parameters, e.g., learning ‘C’ and gamma of SVM classifier.
Q15. What is the size of the dataset?
Their dataset consists of 38 document images which were scanned at a private library located at the old city of Jerusalem and other samples which were collected from the Islamic manuscripts digitization project at Leipzig university library [1].
Q16. What is the size of the rescaled main-body and side-notes components?
From the 38 documentMain-body text and side-notes text are separated and extracted from the original document images to generate the ground truth for the training phase.
Q17. What is the proposed method for detecting layout entities locally?
The proposed method suggests a part-based detection of layout entities locally, using a multi-stage algorithm for the localization of the entities based on interest points.