A color-based layout analysis to process censorship cards of film archives
Summary (1 min read)
1. Introduction
- Many institutions which collect and preserve cultural heritage, as historical documents, have shown a great interest in the digitalization of their resources and in the exploitation of mechanisms to provide online access to digitalized products.
- This paper presents layout analysis issues and problems addressed in the EU funded project COLLATE, whose main goal is to provide film archivists adequate access to historic film-related documents and their associated metadata [5].
- Finally, conclusions are drawn in Section 4.
2. The approach
- A naïve approach to color document image processing would be to separate different colors and to process Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR’05).
- Images are segmented again and the spatial merging is applied on intersecting blocks.
- At each step, the dissimilarity between two clusters of colors (inter-cluster dissimilarity) is evaluated on the basis of two measures: a) the Euclidean distance between two colors taken from distinct clusters (nearest neighbor based dissimilarity); b) the Euclidean distance between the centroids of the two clusters (centroid-based dissimilarity).
- Authorized licensed use limited to: Donato Malerba.
- A first step towards the reconstruction of layout structure consists of classifying the blocks according to their content type: text, horizontal line, vertical line, picture (i.e. halftone images) and graphics (e.g. line drawings).
3. Application
- In this section the authors empirically evaluate the proposed approach in terms of the capability to isolate interesting blocks of different color for subsequent logical labeling.
- In Fig. 4, a document image of the NFA class, that represents the most complex to analyze because of the overall low quality, is shown.
- The document contains manual annotations (no_prec_doc, top right-hand corner), blue stamps (register_office and dispatch_officer, bottom page), red stamps (rubber_stamp, top left-hand corner) and revenue stamps (stamp, in the middle of the page).
- The color-based layout analysis is able to isolate them, while the b/w layout analysis returns a single layout block for the whole central part of the document image and two spurious blocks extracted from the bottom of the image.
- Indeed, for the FAA class, 205 components have been labeled in the color setting against 140 in the b/w, while 64 against 12 for the NFA class.
4. Conclusions
- A new color-based layout analysis method has been proposed in order to meet challenges coming from processing censorship cards of European film archives of the 20ties and 30ties of the last century.
- A comparison of the method with the original b/w version has been provided.
- Results show that the color-based approach allows to isolate interesting blocks better than the previous version and to provide a more accurate base for understanding.
Did you find this useful? Give us your feedback
Citations
12 citations
Cites methods from "A color-based layout analysis to pr..."
...Layout analysis using color information have been proposed in [9]–[11] to handle color document images with complex layouts such as forms, text overlaid on image, posters etc....
[...]
References
61 citations
"A color-based layout analysis to pr..." refers methods in this paper
...Interesting examples are: the MASTER project, that has developed a standard for computer-readable descriptions of medieval manuscripts in European libraries with retrieval objectives [10]; the MEMORIAL project, whose goal is the establishment of a digital document workbench enabling the creation of distributed virtual archives of typewritten documents related to prisoners in World-War II concentration camps [3]; the Bovary project, that concerns the digitalization of 5,000 original manuscripts handwritten by Gustave Flaubert [12]; the D-SCRIBE project, that aims to develop an integrated system for digitization and processing of Old Greek manuscripts [6]....
[...]
45 citations
"A color-based layout analysis to pr..." refers background in this paper
...This paper presents layout analysis issues and problems addressed in the EU funded project COLLATE, whose main goal is to provide film archivists adequate access to historic film-related documents and their associated metadata [5]....
[...]
29 citations
27 citations
"A color-based layout analysis to pr..." refers background in this paper
...Thus, relations between color values and pixel positions in the image plane are not used [13] and the color homogeneity of spatially contiguous pixels is the only used criterion....
[...]
Related Papers (5)
Frequently Asked Questions (2)
Q2. What have the authors stated for future works in "A color-based layout analysis to process censorship cards of film archives" ?
For future works, the authors plan to evaluate the proposed approach in automatic/manual labeling.