Showing papers by "Thomas M. Breuel published in 2006"

PDF

Open Access

Book Chapter•DOI•

Performance comparison of six algorithms for page segmentation

[...]

Faisal Shafait¹, Daniel Keysers¹, Thomas M. Breuel¹•Institutions (1)

German Research Centre for Artificial Intelligence¹

13 Feb 2006

TL;DR: It is shown that no single algorithm outperforms all other algorithms, however, the three best-performing algorithms are those based on constrained text-line finding, Docstrum, and the Voronoi-diagram.

...read moreread less

Abstract: This paper presents a quantitative comparison of six algorithms for page segmentation: X-Y cut, smearing, whitespace analysis, constrained text-line finding, Docstrum, and Voronoi-diagram-based. The evaluation is performed using a subset of the UW-III collection commonly used for evaluation, with a separate training set for parameter optimization. We compare the results using both default parameters and optimized parameters. In the course of the evaluation, the strengths and weaknesses of each algorithm are analyzed, and it is shown that no single algorithm outperforms all other algorithms. However, we observe that the three best-performing algorithms are those based on constrained text-line finding, Docstrum, and the Voronoi-diagram.

...read moreread less

75 citations

Proceedings Article•DOI•

Pixel-Accurate Representation and Evaluation of Page Segmentation in Document Images

[...]

Faisal Shafait¹, Daniel Keysers¹, Thomas M. Breuel¹•Institutions (1)

German Research Centre for Artificial Intelligence¹

20 Aug 2006

TL;DR: A new representation and evaluation procedure of page segmentation algorithms and analyzes six widely-used layout analysis algorithms using the procedure, permitting easy interchange of segmentation results and ground truth.

...read moreread less

Abstract: This paper presents a new representation and evaluation procedure of page segmentation algorithms and analyzes six widely-used layout analysis algorithms using the procedure. The method permits a detailed analysis of the behavior of page segmentation algorithms in terms of over- and undersegmentation at different layout levels, as well as determination of the geometric accuracy of the segmentation. The representation of document layouts relies on labeling each pixel according to its function in the overall segmentation, permitting pixel-accurate representation of layout information of arbitrary layouts and allowing background pixels to be classified as "dont care". Our representations can be encoded easily in standard color image formats like PNG, permitting easy interchange of segmentation results and ground truth.

...read moreread less

50 citations

Proceedings Article•DOI•

Printing Technique Classification for Document Counterfeit Detection

[...]

Christoph H. Lampert¹, Lin Mei¹, Thomas M. Breuel¹•Institutions (1)

German Research Centre for Artificial Intelligence¹

01 Nov 2006

TL;DR: A classification system that supports non-expert users to distinguish original documents from PC-made forgeries by analyzing the printing technique used is proposed, using a support vector machine that has been trained to distinguish laser from inkjet printouts.

...read moreread less

Abstract: The detection of counterfeit in printed documents is currently based mainly on built-in security features or on human expertise. We propose a classification system that supports non-expert users to distinguish original documents from PC-made forgeries by analyzing the printing technique used. Each letter in a document is classified using a support vector machine that has been trained to distinguish laser from inkjet printouts. A color coded visualization helps the user to interpret the per-letter classification results.

...read moreread less

48 citations

Proceedings Article•DOI•

Distance measures for layout-based document image retrieval

[...]

J. van Beusekom¹, Daniel Keysers¹, Faisal Shafait¹, Thomas M. Breuel¹•Institutions (1)

Kaiserslautern University of Technology¹

27 Apr 2006

TL;DR: A new class of distance measures is introduced for documents with Manhattan layouts, based on a two-step procedure, which shows that the best distance measure for this task is the overlapping area combined with the Manhattan distance of the corner points as block distance together with the minimum weight edge cover matching.

...read moreread less

Abstract: Most methods for document image retrieval rely solely on text information to find similar documents. This paper describes a way to use layout information for document image retrieval instead. A new class of distance measures is introduced for documents with Manhattan layouts, based on a two-step procedure: First, the distances between the blocks of two layouts are calculated. Then, the blocks of one layout are assigned to the blocks of the other layout in a matching step. Different block distances and matching methods are compared and evaluated using the publicly available MARG database. On this dataset, the layout type can be determined successfully in 92.6% of the cases using the best distance measure in a nearest neighbor classifier. The experiments show that the best distance measure for this task is the overlapping area combined with the Manhattan distance of the corner points as block distance together with the minimum weight edge cover matching.

...read moreread less

47 citations

Proceedings Article•DOI•

Layout Analysis of Urdu Document Images

[...]

Faisal Shafait¹, Adnan-ul-Hasan¹, Daniel Keysers¹, Thomas M. Breuel²•Institutions (2)

German Research Centre for Artificial Intelligence¹, Kaiserslautern University of Technology²

01 Dec 2006

TL;DR: A layout analysis system for extracting text-lines in reading order from Urdu document images shows high text-line detection accuracy on scanned images of Urdu prose and poetry books and magazines and works reasonably well on newspaper images.

...read moreread less

Abstract: Layout analysis is a key component of an OCR system In this paper, we present a layout analysis system for extracting text-lines in reading order from Urdu document images For this purpose, we evaluate an existing system for Roman script text on Urdu documents and describe its methods and the main changes necessary to adapt it to Urdu script The main changes are: 1) the text-line model for Roman script is modified to adapt to Urdu text, 2) reading order of an Urdu document is defined The method is applied to a collection of scanned Urdu documents from various books, magazines, and newspapers The results show high text-line detection accuracy on scanned images of Urdu prose and poetry books and magazines The algorithm also works reasonably well on newspaper images We also identify directions for future work which may further improve the accuracy of the system

...read moreread less

35 citations

Book Chapter•DOI•

Automated Feature Selection for the Classification of Meningioma Cell Nuclei

[...]

Oliver Wirjadi¹, Oliver Wirjadi², Thomas M. Breuel¹, Wolfgang Feiden³, Yoo-Jin Kim³ - Show less +1 more•Institutions (3)

Kaiserslautern University of Technology¹, Fraunhofer Institute for Industrial Mathematics², Saarland University³

01 Jan 2006

TL;DR: A supervised learning method for image classification is presented which is independent of the type of images that will be processed by constructing a large base of grey-value and colour based image features and relying on a decision tree to choose the features that are most relevant for a given application.

...read moreread less

Abstract: A supervised learning method for image classification is presented which is independent of the type of images that will be processed. This is realized by constructing a large base of grey-value and colour based image features. We then rely on a decision tree to choose the features that are most relevant for a given application. We apply and evaluate our system on the classification task of meningioma cells.

...read moreread less

18 citations

Book Chapter•DOI•

Satellite tracks removal in astronomical images

[...]

Haider Ali¹, Christoph H. Lampert¹, Thomas M. Breuel¹•Institutions (1)

German Research Centre for Artificial Intelligence¹

14 Nov 2006

TL;DR: A new system for ”Finding Satellite Tracks” in astronomical images based on the modern geometric approach based on geometric matching method ”Recognition by Adaptive Subdivision of Transformation Space (RAST)” is described.

...read moreread less

Abstract: This paper describes a new system for ”Finding Satellite Tracks” in astronomical images based on the modern geometric approach. There is an increasing need of using methods with solid mathematical and statistical foundation in astronomical image processing. Where the computational methods are serving in all disciplines of science, they are becoming popular in the field of astronomy as well. Currently different computational systems are required to be numerically optimized before to get applied on astronomical images. So at present there is no single system which solves the problems of astronomers using computational methods based on modern approaches. The system ”Finding Satellite Tracks” is based on geometric matching method ”Recognition by Adaptive Subdivision of Transformation Space (RAST)”.

...read moreread less

11 citations

Proceedings Article•DOI•

Color image dequantization by constrained diffusion

[...]

Daniel Keysers¹, Christoph H. Lampert¹, Thomas M. Breuel¹•Institutions (1)

Kaiserslautern University of Technology¹

16 Jan 2006-electronic imaging

TL;DR: A simple and effective method for the dequantization of color images, effectively interpolating the colors from quantized levels to a continuous range of brightness values is proposed.

...read moreread less

Abstract: We propose a simple and effective method for the dequantization of color images, effectively interpolating the colors from quantized levels to a continuous range of brightness values. The method is designed to be applied to images that either have undergone a manipulation like image brightness adjustment, or are going to be processed in such a way. Such operations often cause noticeable color bands in the images that can be reduced using the proposed Constrained Diffusion technique. We demonstrate the advantages of our method using synthetic and real life images as examples. We also present quantitative results using 8 bit data that has been obtained from original 12 bit sensor data and obtain substantial gains in PSNR using the proposed method.

...read moreread less

11 citations

Journal Article•

Optimal line and arc detection on run-length representations

[...]

Daniel Keysers, Thomas M. Breuel

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: In this article, a branch-and-bound approach is proposed for the globally optimal detection of line and arc primitives using runs of black pixels in a bi-level image.

...read moreread less

Abstract: The robust detection of lines and arcs in scanned documents or technical drawings is an important problem in document image understanding. We present a new solution to this problem that works directly on run-length encoded data. The method finds globally optimal solutions to parameterized thick line and arc models. Line thickness is part of the model and directly used during the matching process. Unlike previous approaches, it does not require any thinning or other preprocessing steps, no computation of the line adjacency graphs, and no heuristics. Furthermore, the only search-related parameter that needs to be specified is the desired numerical accuracy of the solution. The method is based on a branch-and-bound approach for the globally optimal detection of these geometric primitives using runs of black pixels in a bi-level image. We present qualitative and quantitative results of the algorithm on images used in the 2003 and 2005 GREC arc segmentation contests.

...read moreread less

9 citations