High Performance Layout Analysis of Medieval European Document Images.

doi:10.5220/0006574603240331

Open AccessProceedings ArticleDOI

High Performance Layout Analysis of Medieval European Document Images.

- pp 324-331

TLDR

High performance page segmentation techniques for medieval European document images which include a novel main-body and side-notes segregation and an improved version of OCRopus (OCRopus, ) based text line extraction are presented.

Abstract:

Layout analysis, mainly including binarization and page segmentation, is one of the most important performance determining steps of an OCR system for complex medieval document images, which contain noise, distortions and irregular layouts. In this paper, we present high performance page segmentation techniques for medieval European document images which include a novel main-body and side-notes segregation and an improved version of OCRopus (OCRopus, ) based text line extraction. In order to complete the high performance layout analysis pipeline, we have also presented the application of the percentile based binarization (Afzal et al., 2014) and the multiresolution morphology based text and non-text segmentation (Bukhari et al., 2011) methods over historical document images. presented layout analysis techniques are applied to a collection of the 15th century Latin document images, which achieved more than 90% accuracy for each of the segmentation techniques.

High Performance Layout Analysis of Medieval European Document Images.

Citations

anyOCR: An Open-Source OCR System for Historical Archives

Multi-scale Gated Fully Convolutional DenseNets for semantic labeling of historical newspaper images

Segmentation-Less Extraction of Text and Non-Text Regions From JPEG 2000 Compressed Document Images Through Partial and Intelligent Decompression

Segmentation-Less Extraction of Text and Non-Text Regions From JPEG 2000 Compressed Document Images Through Partial and Intelligent Decompression

References

Document analysis system

The document spectrum for page layout analysis

The document spectrum for page layout analysis

Twenty years of document image analysis in PAMI

A prototype document image analysis system for technical journals

Related Papers (1)

The Origins and Rise of Medieval Information Visualization