scispace - formally typeset
Search or ask a question
Author

Rolf Ingold

Other affiliations: University of Freiburg
Bio: Rolf Ingold is an academic researcher from University of Fribourg. The author has contributed to research in topics: Historical document & Image segmentation. The author has an hindex of 25, co-authored 182 publications receiving 2469 citations. Previous affiliations of Rolf Ingold include University of Freiburg.


Papers
More filters
Proceedings ArticleDOI
26 Jul 2009
TL;DR: The purpose of this database is the large-scale benchmarking of open-vocabulary,multi-font, multi-size and multi-style text recognition systems in Arabic.
Abstract: We report on the creation of a database composed of images of Arabic Printed words. The purpose of this database is the large-scale benchmarking of open-vocabulary, multi-font, multi-size and multi-style text recognition systems in Arabic. The challenges that are addressed by the database are in the variability of the sizes, fonts and style used to generate the images. A focus is also given on low-resolution images where anti-aliasing is generating noise on the characters to recognize. The database is synthetically generated using a lexicon of 113’284 words, 10 Arabic fonts, 10 font sizes and 4 font styles. The database contains 45’313’600 single word images totaling to more than 250 million characters. Ground truth annotation is provided for each image. The database is called APTI for Arabic Printed Text Images.

130 citations

Proceedings ArticleDOI
23 Aug 2015
TL;DR: This paper considers page segmentation as a pixel labeling problem, i.e., each pixel is classified as either periphery, background, text block, or decoration, and applies convolutional autoencoders to learn features directly from pixel intensity values.
Abstract: In this paper, we present an unsupervised feature learning method for page segmentation of historical handwritten documents available as color images. We consider page segmentation as a pixel labeling problem, i.e., each pixel is classified as either periphery, background, text block, or decoration. Traditional methods in this area rely on carefully hand-crafted features or large amounts of prior knowledge. In contrast, we apply convolutional autoencoders to learn features directly from pixel intensity values. Then, using these features to train an SVM, we achieve high quality segmentation without any assumption of specific topologies and shapes. Experiments on three public datasets demonstrate the effectiveness and superiority of the proposed approach.

110 citations

Proceedings ArticleDOI
01 Oct 2016
TL;DR: A publicly available historical manuscript database DIVA-HisDB is introduced for the evaluation of several Document Image Analysis (DIA) tasks and a layout analysis ground-truth which has been iterated on, reviewed, and refined by an expert in medieval studies is provided.
Abstract: This paper introduces a publicly available historical manuscript database DIVA-HisDB for the evaluation of several Document Image Analysis (DIA) tasks. The database consists of 150 annotated pages of three different medieval manuscripts with challenging layouts. Furthermore, we provide a layout analysis ground-truth which has been iterated on, reviewed, and refined by an expert in medieval studies. DIVA-HisDB and the ground truth can be used for training and evaluating DIA tasks, such as layout analysis, text line segmentation, binarization and writer identification. Layout analysis results of several representative baseline technologies are also presented in order to help researchers evaluate their methods and advance the frontiers of complex historical manuscripts analysis. An optimized state-of-the-art Convolutional Auto-Encoder (CAE) performs with around 95% accuracy, demonstrating that for this challenging layout there is much room for improvement. Finally, we show that existing text line segmentation methods fail due to interlinear and marginal text elements.

82 citations

Journal ArticleDOI
TL;DR: A new font and size identification method for ultra-low resolution Arabic word images using a stochastic approach and is about 23% better than the global multi-font system in terms of word recognition rate on the Arabic Printed Text Image database.

75 citations

Proceedings ArticleDOI
01 Nov 2017
TL;DR: In this article, a simple CNN with only one convolutional layer was proposed to learn features from raw image pixels using a CNN, which achieved competitive results against other deep architectures on different public datasets.
Abstract: This paper presents a page segmentation method for handwritten historical document images based on a Convolutional Neural Network (CNN). We consider page segmentation as a pixel labeling problem, i.e., each pixel is classified as one of the predefined classes. Traditional methods in this area rely on hand-crafted features carefully tuned considering prior knowledge. In contrast, we propose to learn features from raw image pixels using a CNN. While many researchers focus on developing deep CNN architectures to solve different problems, we train a simple CNN with only one convolution layer. We show that the simple architecture achieves competitive results against other deep architectures on different public datasets. Experiments also demonstrate the effectiveness and superiority of the proposed method compared to previous methods.

74 citations


Cited by
More filters
Reference EntryDOI
15 Oct 2004

2,118 citations

01 Jan 1979
TL;DR: This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis and addressing interesting real-world computer Vision and multimedia applications.
Abstract: In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to: • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information. Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

1,758 citations

Book
01 Dec 2006
TL;DR: Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.
Abstract: 1. Introduction to text mining 2. Core text mining operations 3. Text mining preprocessing techniques 4. Categorization 5. Clustering 6. Information extraction 7. Probabilistic models for Information extraction 8. Preprocessing applications using probabilistic and hybrid approaches 9. Presentation-layer considerations for browsing and query refinement 10. Visualization approaches 11. Link analysis 12. Text mining applications Appendix Bibliography.

1,628 citations

Book ChapterDOI
01 Jan 1996
TL;DR: Exploring and identifying structure is even more important for multivariate data than univariate data, given the difficulties in graphically presenting multivariateData and the comparative lack of parametric models to represent it.
Abstract: Exploring and identifying structure is even more important for multivariate data than univariate data, given the difficulties in graphically presenting multivariate data and the comparative lack of parametric models to represent it. Unfortunately, such exploration is also inherently more difficult.

920 citations