Other affiliations: Institut national des sciences Appliquées de Lyon
Bio: Franck Lebourgeois is an academic researcher from University of Lyon. The author has contributed to research in topics: Cluster analysis & Sparse approximation. The author has an hindex of 6, co-authored 14 publications receiving 141 citations. Previous affiliations of Franck Lebourgeois include Institut national des sciences Appliquées de Lyon.
30 Aug 1992
TL;DR: Outlines a fast and efficient method for extracting graphics and text paragraphs from printed documents based on bottom-up approach to document analysis and achieves very good performance in most cases.
Abstract: Outlines a fast and efficient method for extracting graphics and text paragraphs from printed documents. The method presented is based on bottom-up approach to document analysis and it achieves very good performance in most cases. During the preprocessing characters are linked together to form blocks. Created blocks are segmented, labelled and merged into paragraphs. Simultaneously, graphics are extracted from the image. Algorithms for each step of processing are presented. Also, the obtained experimental results are included. >
25 Aug 2013
TL;DR: Experimental results on character recognition illustrate that the proposed method outperforms the other methods, involved in this study, by providing better recognition rates.
Abstract: This paper addresses the problem of generating a super-resolved version of a low-resolution textual image by using Sparse Coding (SC) which suggests that image patches can be sparsely represented from a suitable dictionary. In order to enhance the learning performance and improve the reconstruction ability, we propose in this paper a multiple learned dictionaries based clustered SC approach for single text image super resolution. For instance, a large High-Resolution/Low-Resolution (HR/LR) patch pair database is collected from a set of high quality character images and then partitioned into several clusters by performing an intelligent clustering algorithm. Two coupled HR/LR dictionaries are learned from each cluster. Based on SC principle, local patch of a LR image is represented from each LR dictionary generating multiple sparse representations of the same patch. The representation that minimizes the reconstruction error is retained and applied to generate a local HR patch from the corresponding HR dictionary. The performance of the proposed approach is evaluated and compared visually and quantitatively to other existing methods applied to text images. In addition, experimental results on character recognition illustrate that the proposed method outperforms the other methods, involved in this study, by providing better recognition rates.
••16 Dec 2012
TL;DR: The performance of the proposed Super-Resolution method is evaluated and compared visually and quantitatively to other existing SR methods applied to text images, and the influence of text image resolution on automatic recognition performance is examined.
Abstract: This paper addresses the problem of generating a super-resolved text image from a single low-resolution image. The proposed Super-Resolution (SR) method is based on sparse coding which suggests that image patches can be well represented as a sparse linear combination of elements from a suitably chosen learned dictionary. Toward this strategy, a High-Resolution/Low-Resolution (HR/LR) patch pair data base is collected from high quality character images. To our knowledge, it is the first generic database allowing SR of text images may be contained in documents, signs, labels, bills, etc. This database is used to train jointly two dictionaries. The sparse representation of a LR image patch from the first dictionary can be applied to generate a HR image patch from the second dictionary. The performance of such approach is evaluated and compared visually and quantitatively to other existing SR methods applied to text images. In addition, we examine the influence of text image resolution on automatic recognition performance and we further justify the effectiveness of the proposed SR method compared to others.
••24 Aug 2014
TL;DR: A coupled dictionary learning approach is proposed to generate dual dictionaries representing coupled feature spaces to reduce computational complexity and improve image quality improvements.
Abstract: Sparse coding is widely known as a methodology where an input signal can be sparsely represented from a suitable dictionary. It was successfully applied on a wide range of applications like the textual image Super-Resolution. Nevertheless, its complexity limits enormously its application. Looking for a reduced computational complexity, a coupled dictionary learning approach is proposed to generate dual dictionaries representing coupled feature spaces. Under this approach, we optimize the training of a first dictionary for the high-resolution image space and then a second dictionary is simply deduced from the latter for the low-resolution image space. In contrast with the classical dictionary learning approaches, the proposed approach allows a noticeable speedup and a major simplification of the coupled dictionary learning phase both in terms of algorithm architecture and computational complexity. Furthermore, the resolution enhancement results achieved by applying the proposed approach on poorly resolved textual images lead to image quality improvements.
••14 Jan 2015
TL;DR: The GP-STM extends traditional linear Style Transfer Mapping by using Gaussian process and kernel methods, and accuracy increases by nearly 15 percentage points, and the impact of the number of training samples is evaluated.
Abstract: Historical Chinese character recognition is very important to larger scale historical document digitalization, but is a very challenging problem due to lack of labeled training samples. This paper proposes a novel non-linear transfer learning method, namely Gaussian Process Style Transfer Mapping (GP-STM). The GP-STM extends traditional linear Style Transfer Mapping (STM) by using Gaussian process and kernel methods. With GP-STM, existing printed Chinese character samples are used to help the recognition of historical Chinese characters. To demonstrate this framework, we compare feature extraction methods, train a modified quadratic discriminant function (MQDF) classifier on printed Chinese character samples, and implement the GP-STM model on Dunhuang historical documents. Various kernels and parameters are explored, and the impact of the number of training samples is evaluated. Experimental results show that accuracy increases by nearly 15 percentage points (from 42.8% to 57.5%) using GP-STM, with an improvement of more than 8 percentage points (from 49.2% to 57.5%) compared to the STM approach.
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.
••23 Aug 2020
TL;DR: The LayoutLM is proposed to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.
Abstract: Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at https://aka.ms/layoutlm.
TL;DR: A new document model which preserves top-down generation information is proposed based on which a document is logically represented for interactive editing, storage, retrieval, transfer, and logical analysis.
Abstract: Transforming a paper document to its electronic version in a form suitable for efficient storage, retrieval, and interpretation continues to be a challenging problem. An efficient representation scheme for document images is necessary to solve this problem. Document representation involves techniques of thresholding, skew detection, geometric layout analysis, and logical layout analysis. The derived representation can then be used in document storage and retrieval. Page segmentation is an important stage in representing document images obtained by scanning journal pages. The performance of a document understanding system greatly depends on the correctness of page segmentation and labeling of different regions such as text, tables, images, drawings, and rulers. We use the traditional bottom-up approach based on the connected component extraction to efficiently implement page segmentation and region identification. A new document model which preserves top-down generation information is proposed based on which a document is logically represented for interactive editing, storage, retrieval, transfer, and logical analysis. Our algorithm has a high accuracy and takes approximately 1.4 seconds on a SGI Indy workstation for model creation, including orientation estimation, segmentation, and labeling (text, table, image, drawing, and ruler) for a 2550/spl times/3300 image of a typical journal page scanned at 300 dpi. This method is applicable to documents from various technical journals and can accommodate moderate amounts of skew and noise.
TL;DR: In this article, a new adaptation layer is proposed to reduce the mismatch between training and test data on a particular source layer, and the adaptation process can be efficiently and effectively implemented in an unsupervised manner.
Abstract: Recent deep learning based methods have achieved the state-of-the-art performance for handwritten Chinese character recognition (HCCR) by learning discriminative representations directly from raw data. Nevertheless, we believe that the long-and-well investigated domain-specific knowledge should still help to boost the performance of HCCR. By integrating the traditional normalization-cooperated direction-decomposed feature map (directMap) with the deep convolutional neural network (convNet), we are able to obtain new highest accuracies for both online and offline HCCR on the ICDAR-2013 competition database. With this new framework, we can eliminate the needs for data augmentation and model ensemble, which are widely used in other systems to achieve their best results. This makes our framework to be efficient and effective for both training and testing. Furthermore, although directMap+convNet can achieve the best results and surpass human-level performance, we show that writer adaptation in this case is still effective. A new adaptation layer is proposed to reduce the mismatch between training and test data on a particular source layer. The adaptation process can be efficiently and effectively implemented in an unsupervised manner. By adding the adaptation layer into the pre-trained convNet, it can adapt to the new handwriting styles of particular writers, and the recognition accuracy can be further improved consistently and significantly. This paper gives an overview and comparison of recent deep learning based approaches for HCCR, and also sets new benchmarks for both online and offline HCCR.
01 Jul 2017
TL;DR: Li et al. as mentioned in this paper presented an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images, which considers document semantic structure extraction as a pixel-wise segmentation task, and proposes a unified model that classifies pixels based not only on their visual appearance, but also on the content of underlying text.
Abstract: We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover, we propose an efficient synthetic document generation process that we use to generate pretraining data for our network. Once the network is trained on a large set of synthetic documents, we fine-tune the network on unlabeled real documents using a semi-supervised approach. We systematically study the optimum network architecture and show that both our multimodal approach and the synthetic data pretraining significantly boost the performance.