scispace - formally typeset
Search or ask a question
Author

Ines Ben Messaoud

Bio: Ines Ben Messaoud is an academic researcher from University of Sfax. The author has contributed to research in topics: XML validation & Document Structure Description. The author has an hindex of 8, co-authored 21 publications receiving 160 citations. Previous affiliations of Ines Ben Messaoud include University of Gabès & Braunschweig University of Technology.

Papers
More filters
Proceedings ArticleDOI
18 Sep 2011
TL;DR: The aim of the present approach is the application of binarization algorithms on selected objects-of-interest on a combination between a preprocessing step and a localization step.
Abstract: Document analysis and recognition systems include, usually, several levels, annotation, preprocessing, segmentation, feature extraction, classification and post-processing. Each level may be dependent on or independent from the other levels. The presence of noise in images can affect the performance of the entire system. This noise can be introduced by the digitization step or from the document itself. In this paper, we present a new binarization approach based on a combination between a preprocessing step and a localization step. The aim of the present approach is the application of binarization algorithms on selected objects-of-interest. The evaluation of the developed approach is performed using two benchmarking datasets from the last two document binarization contests (DIBCO 2009 and H-DIBCO 2010). It shows very promising results.

31 citations

Proceedings ArticleDOI
16 Sep 2011
TL;DR: The proposed framework is decomposed of two phases, the selection and the evaluation, where one or multiple methods are corresponded for each book of the used database.
Abstract: The objective of document preprocessing is to ease the text recognition or the document indexing processes. The analysis of historical documents seems to be a big challenge because the majority of those documents are noisy and present many degradations. In this paper we propose a preprocessing framework for a large dataset of historical documents. The proposed framework is decomposed of two phases, the selection and the evaluation. During the first phase one or multiple methods are corresponded for each book of the used database. The validation of the selection results is performed during the evaluation. The experiments are applied on printed and handwritten documents extracted respectively from Google-Books and Bayerische Staatsbibliothek databases. The results returned during the evaluation are very promising.

21 citations

Journal ArticleDOI
TL;DR: This paper proposes a two-method approach to build Document Warehouse conceptual schemas for the unification of XML document structures; it aims to elaborate a global and generic view for a set of XML documents belonging to the same domain.
Abstract: Data Warehouses and OLAP (On Line Analytical Processing) technologies are dedicated to analyzing structured data issued from organizations' OLTP (On Line Transaction Processing) systems. Furthermore, in order to enhance their decision support systems, these organizations need to explore XML (eXtensible Markup Language) documents as an additional and important source of unstructured data. In this context, this paper addresses the warehousing of document-centric XML documents. More specifically, we propose a two-method approach to build Document Warehouse conceptual schemas. The first method is for the unification of XML document structures; it aims to elaborate a global and generic view for a set of XML documents belonging to the same domain. The second method is for designing multidimensional galaxy schemas for Document Warehouses.

20 citations

Proceedings ArticleDOI
18 Sep 2012
TL;DR: A multilevel segmentation framework for handwritten historical documents in which one or many segmentation methods are selected according to the input document features is proposed.
Abstract: Text-line segmentation is considered as a crucial step of document analysis and recognition systems because its output is considered as the input of recognition systems. Due to the reason that the same handwritten image page has different characteristics, we propose in this paper a multilevel segmentation framework for handwritten historical documents. In this framework, one or many segmentation methods are selected according to the input document features. This framework is tested on the IAM historical database (60 images) and on images from the segmentation competition for handwritten document segmentation held at ICFHR 2010. The evaluation of the segmentation framework is based on several evaluation metrics. The tests show that the proposed framework gives promoting results.

15 citations

Proceedings ArticleDOI
17 Sep 2011
TL;DR: A method for the selection of the input parameters of binarization methods according to the noise type detected in the image, which is based on benchmarking datasets used at DIBCO 2009 and H-DIBCO 2010.
Abstract: Historical documents contain generally different kind of degradations. Due to this degradations the application of methods of noise removal during a preprocessing stage seems to be necessary. Since the noise which, exists in the original document can not be eliminated using a simple noise removal algorithm and it influences the preprocessing result e.g. the binarization, a function of noise detection seems to be necessary. We present in this paper a method for the selection of the input parameters of binarization methods according to the noise type detected in the image. The tests are achieved on benchmarking datasets used at DIBCO 2009 and H-DIBCO 2010. The results returned by the binarization methods using the noise features are promising.

14 citations


Cited by
More filters
Proceedings ArticleDOI
18 Sep 2011
TL;DR: The contest details including the evaluation measures used as well as the performance of the 18 submitted methods are described along with a short description of each method.
Abstract: DIBCO 2011 is the International Document Image Binarization Contest organized in the context of ICDAR 2011 conference. The general objective of the contest is to identify current advances in document image binarization for both machine-printed and handwritten document images using evaluation performance measures that conform to document image analysis and recognition. This paper describes the contest details including the evaluation measures used as well as the performance of the 18 submitted methods along with a short description of each method.

202 citations

Posted Content
TL;DR: This paper is a combination of a survey on current AutoML methods and a benchmark of popular AutoML frameworks on real data sets to summarize and review important AutoML techniques and methods concerning every step in building an ML pipeline.
Abstract: Machine learning (ML) has become a vital part in many aspects of our daily life. However, building well performing machine learning applications requires highly specialized data scientists and domain experts. Automated machine learning (AutoML) aims to reduce the demand for data scientists by enabling domain experts to build machine learning applications automatically without extensive knowledge of statistics and machine learning. This paper is a combination of a survey on current AutoML methods and a benchmark of popular AutoML frameworks on real data sets. Driven by the selected frameworks for evaluation, we summarize and review important AutoML techniques and methods concerning every step in building an ML pipeline. The selected AutoML frameworks are evaluated on 137 data sets from established AutoML benchmark suits.

162 citations

Journal ArticleDOI
TL;DR: This paper addresses a pixel-based binarization evaluation methodology for historical handwritten/machine-printed document images using a weighting scheme that diminishes any potential evaluation bias.
Abstract: Document image binarization is of great importance in the document image analysis and recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behavior, as well as verifying its effectiveness, by providing qualitative and quantitative indication of its performance. This paper addresses a pixel-based binarization evaluation methodology for historical handwritten/machine-printed document images. In the proposed evaluation scheme, the recall and precision evaluation measures are properly modified using a weighting scheme that diminishes any potential evaluation bias. Additional performance metrics of the proposed evaluation scheme consist of the percentage rates of broken and missed text, false alarms, background noise, character enlargement, and merging. Several experiments conducted in comparison with other pixel-based evaluation measures demonstrate the validity of the proposed evaluation scheme.

139 citations

Journal ArticleDOI
TL;DR: A combination of a global and a local adaptive binarization method at connected component level is proposed that aims in an improved overall performance and achieves top performance after extensive testing on the DIBCO (Document Image Binarization Contest) series datasets.

111 citations

Journal ArticleDOI
TL;DR: The proposed LBPruns and COLD features are textural-based curvature-free features and capture the line information of handwritten texts instead of the curvature information and provide a significant improvement on the CERUG data set.

82 citations