scispace - formally typeset
Search or ask a question
Author

Frank Lebourgeois

Bio: Frank Lebourgeois is an academic researcher from University of Lyon. The author has contributed to research in topics: Image segmentation & Optical character recognition. The author has an hindex of 13, co-authored 58 publications receiving 683 citations. Previous affiliations of Frank Lebourgeois include Institut national des sciences appliquées & Institut national des sciences Appliquées de Lyon.


Papers
More filters
Journal ArticleDOI
TL;DR: A text search algorithm designed for ancient manuscripts is introduced based on differential features that are compared using a cohesive elastic matching method, based on zones of interest in order to match only the informative parts of the words.
Abstract: In this article we introduce a text search algorithm designed for ancient manuscripts. Word-spotting is the best alternative to word recognition on this type of document. Our method is based on differential features that are compared using a cohesive elastic matching method, based on zones of interest in order to match only the informative parts of the words. Thus we improved both the accuracy and the runtime of the word-spotting process. The proposed method is tested on medieval manuscripts of Latin and Semitic alphabets as well as on more recent manuscripts.

124 citations

Journal ArticleDOI
TL;DR: The first method that allows the indexation of ancient manuscripts of any language and alphabet and makes use of features fitted to any type of alphabet and writing is introduced.
Abstract: In this article, we introduce the first method that allows the indexation of ancient manuscripts of any language and alphabet. We describe a word retrieval engine inspired by recent word-spotting advances on ancient manuscripts. Our approach does not need any layout segmentation and makes use of features fitted to any type of alphabet (Latin, Arabic, Chinese, etc.) and writing. The engine is tested on numerous documents and in several use-cases.

115 citations

Book
05 Jul 2003
TL;DR: In this article, the authors present the modeles les plus couramment utilises for decrire ce qu'est une image, ainsi que les methodes de traitement et d'analyse qui decoulent de ces modeles : amelioration d'images, restauration, seuillage, detection de contours and recherche de regions.
Abstract: L'image est devenue le support principal de l'information Le traitement des images fixes ou animees est par consequent un domaine de recherche en pleine expansion et aux applications toujours plus nombreuses Cet ouvrage presente ce qu'est le traitement d'images de facon theorique et pratique, a travers les approches signal, statistique, fonctionnelle, geometrique et ensembliste On y retrouve les modeles les plus couramment utilises pour decrire ce qu'est une image, ainsi que les methodes de traitement et d'analyse qui decoulent de ces modeles : amelioration d'images, restauration, seuillage, detection de contours et recherche de regions Le cas particulier des images binaires est traite, et plus particulierement les aspects codage et morphologie mathematique, ainsi que le probleme de la comparaison d'images De nombreux exemples, images et diagrammes, etayent le propos

31 citations

Journal ArticleDOI
TL;DR: Significant improvements in visual quality and character recognition rates are achieved using the proposed approach, confirmed by a detailed comparative study with state-of-the-art upscaling approaches.
Abstract: Resolution enhancement has become a valuable research topic due to the rapidly growing need for high-quality images in various applications. Various resolution enhancement approaches have been successfully applied on natural images. Nevertheless, their direct application to textual images is not efficient enough due to the specificities that distinguish these particular images from natural images. The use of insufficient resolution introduces substantial loss of details which can make a text unreadable by humans and unrecognizable by OCR systems. To address these issues, a sparse coding-based approach is proposed to enhance the resolution of a textual image. Three major contributions are presented in this paper: (1) Multiple coupled dictionaries are learned from a clustered database and selected adaptively for a better reconstruction. (2) An automatic process is developed to collect the training database, which contains writing patterns extracted from high-quality character images. (3) A new local feature descriptor well suited for writing specificities is proposed for the clustering of the training database. The performance of these propositions is evaluated qualitatively and quantitatively on various types of low-resolution textual images. Significant improvements in visual quality and character recognition rates are achieved using the proposed approach, confirmed by a detailed comparative study with state-of-the-art upscaling approaches.

27 citations

Proceedings ArticleDOI
26 Jul 2009
TL;DR: A modification of the Weickert coherence enhancing diffusion filter is proposed for which new constraints formulated form the Perona-Malik equation are added and leads to a noticeable improvement of the OCR system's accuracy proven through the comparison of OCR recognition rates before and after the diffusion process.
Abstract: A modification of the Weickert coherence enhancing diffusion filter is proposed for which new constraints formulated form the Perona-Malik equation are added. The new diffusion filter, driven by local tensors fields, takes benefit from both of these approaches and avoids problems known to affect them. This filter reinforces character discontinuity and eliminates the inherent problem of corner rounding while smoothing. Experiments conducted on degraded document images illustrate the effectiveness of the proposed method compared to another anisotropic diffusion approaches. A visual quality improvement is thus achieved on these images. Such improvement leads to a noticeable improvement of the OCR system's accuracy proven through the comparison of OCR recognition rates before and after the diffusion process.

27 citations


Cited by
More filters
Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations

Reference EntryDOI
15 Oct 2004

2,118 citations

Proceedings ArticleDOI
22 Aug 2005
TL;DR: This paper applies three statistical machine learning algorithms to automatically identify signatures for a range of applications and finds that this approach is highly accurate and scales to allow online application identification on high speed links.
Abstract: An accurate mapping of traffic to applications is important for a broad range of network management and measurement tasks. Internet applications have traditionally been identified using well-known default server network-port numbers in the TCP or UDP headers. However this approach has become increasingly inaccurate. An alternate, more accurate technique is to use specific application-level features in the protocol exchange to guide the identification. Unfortunately deriving the signatures manually is very time consuming and difficult.In this paper, we explore automatically extracting application signatures from IP traffic payload content. In particular we apply three statistical machine learning algorithms to automatically identify signatures for a range of applications. The results indicate that this approach is highly accurate and scales to allow online application identification on high speed links. We also discovered that content signatures still work in the presence of encryption. In these cases we were able to derive content signature for unencrypted handshakes negotiating the encryption parameters of a particular connection.

420 citations

Journal ArticleDOI
TL;DR: This presentation clarifies both the decisions made by a table recognizer and the assumptions and inferencing techniques that underlie these decisions.
Abstract: Table characteristics vary widely. Consequently, a great variety of computational approaches have been applied to table recognition. In this survey, the table recognition literature is presented as an interaction of table models, observations, transformations, and inferences. A table model defines the physical and logical structure of tables; the model is used to detect tables and to analyze and decompose the detected tables. Observations perform feature measurements and data lookup, transformations alter or restructure data, and inferences generate and test hypotheses. This presentation clarifies both the decisions made by a table recognizer and the assumptions and inferencing techniques that underlie these decisions.

334 citations

Proceedings ArticleDOI
26 Jul 2009
TL;DR: The contest details including the evaluation measures used as well as the performance of the 43 submitted methods are described along with a short description of each method.
Abstract: DIBCO 2009 is the first International Document Image Binarization Contest organized in the context of ICDAR 2009 conference. The general objective of the contest is to identify current advances in document image binarization using established evaluation performance measures. This paper describes the contest details including the evaluation measures used as well as the performance of the 43 submitted methods along with a short description of each method.

296 citations