scispace - formally typeset
Search or ask a question
Author

Jean Hennebert

Other affiliations: Wallis, École Normale Supérieure, Verisign  ...read more
Bio: Jean Hennebert is an academic researcher from University of Fribourg. The author has contributed to research in topics: Speaker recognition & Hidden Markov model. The author has an hindex of 26, co-authored 145 publications receiving 2484 citations. Previous affiliations of Jean Hennebert include Wallis & École Normale Supérieure.


Papers
More filters
Journal ArticleDOI
TL;DR: A new multimodal biometric database designed and acquired within the framework of the European BioSecure Network of Excellence is presented, comprised of more than 600 individuals acquired simultaneously in three scenarios: over the Internet, in an office environment with desktop PC, and in indoor/outdoor environments with mobile portable hardware.
Abstract: A new multimodal biometric database designed and acquired within the framework of the European BioSecure Network of Excellence is presented. It is comprised of more than 600 individuals acquired simultaneously in three scenarios: 1 over the Internet, 2 in an office environment with desktop PC, and 3 in indoor/outdoor environments with mobile portable hardware. The three scenarios include a common part of audio/video data. Also, signature and fingerprint data have been acquired both with desktop PC and mobile portable hardware. Additionally, hand and iris data were acquired in the second scenario using desktop PC. Acquisition has been conducted by 11 European institutions. Additional features of the BioSecure Multimodal Database (BMDB) are: two acquisition sessions, several sensors in certain modalities, balanced gender and age distributions, multimodal realistic scenarios with simple and quick tasks per modality, cross-European diversity, availability of demographic data, and compatibility with other multimodal databases. The novel acquisition conditions of the BMDB allow us to perform new challenging research and evaluation of either monomodal or multimodal biometric systems, as in the recent BioSecure Multimodal Evaluation campaign. A description of this campaign including baseline results of individual modalities from the new database is also given. The database is expected to be available for research purposes through the BioSecure Association during 2008.

233 citations

Proceedings ArticleDOI
26 Jul 2009
TL;DR: The purpose of this database is the large-scale benchmarking of open-vocabulary,multi-font, multi-size and multi-style text recognition systems in Arabic.
Abstract: We report on the creation of a database composed of images of Arabic Printed words. The purpose of this database is the large-scale benchmarking of open-vocabulary, multi-font, multi-size and multi-style text recognition systems in Arabic. The challenges that are addressed by the database are in the variability of the sizes, fonts and style used to generate the images. A focus is also given on low-resolution images where anti-aliasing is generating noise on the characters to recognize. The database is synthetically generated using a lexicon of 113’284 words, 10 Arabic fonts, 10 font sizes and 4 font styles. The database contains 45’313’600 single word images totaling to more than 250 million characters. Ground truth annotation is provided for each image. The database is called APTI for Arabic Printed Text Images.

130 citations

Proceedings ArticleDOI
24 Aug 2014
TL;DR: This paper provides a survey of current researches on Intrusive Load Monitoring (ILM) techniques, focusing on feature extraction and machine learning algorithms typically used for ILM applications.
Abstract: Electricity load monitoring of appliances has become an important task considering the recent economic and ecological trends. In this game, machine learning has an important part to play, allowing for energy consumption understanding, critical equipment monitoring and even human activity recognition. This paper provides a survey of current researches on Intrusive Load Monitoring (ILM) techniques. ILM relies on low-end electricity meter devices spread inside the habitations, as opposed to Non-Intrusive Load Monitoring (NILM) that relies on an unique point of measurement, the smart meter. Potential applications and principles of ILMs are presented and compared to NILM. A focus is also given on feature extraction and machine learning algorithms typically used for ILM applications.

119 citations

Proceedings ArticleDOI
23 Aug 2015
TL;DR: This paper considers page segmentation as a pixel labeling problem, i.e., each pixel is classified as either periphery, background, text block, or decoration, and applies convolutional autoencoders to learn features directly from pixel intensity values.
Abstract: In this paper, we present an unsupervised feature learning method for page segmentation of historical handwritten documents available as color images. We consider page segmentation as a pixel labeling problem, i.e., each pixel is classified as either periphery, background, text block, or decoration. Traditional methods in this area rely on carefully hand-crafted features or large amounts of prior knowledge. In contrast, we apply convolutional autoencoders to learn features directly from pixel intensity values. Then, using these features to train an SVM, we achieve high quality segmentation without any assumption of specific topologies and shapes. Experiments on three public datasets demonstrate the effectiveness and superiority of the proposed approach.

110 citations

01 Jan 2010
TL;DR: Several computer vision approaches have been developed for skin detection, which typically transforms a given pixel into an appropriate color space and then uses a skin classifier to label the pixel whether it is a ski n or a non-skin pixel.
Abstract: Skin detection is the process of finding skin-colored pixels and regions in an image or a video. This process is typically used as a preprocessing step to find regions that potentially have human faces and limbs in images. Several computer vision approach es have been developed for skin detection. A skin detector typically transforms a given pix el into an appropriate color space and then use a skin classifier to label the pixel whether it is a ski n or a non-skin pixel. A skin classifier defines a decision boundary of the skin color class in the colo r space based on a training database of skin-colored pixels.

92 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.
Abstract: We propose a novel context-dependent (CD) model for large-vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output. The deep belief network pre-training algorithm is a robust and often helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. We illustrate the key components of our model, describe the procedure for applying CD-DNN-HMMs to LVSR, and analyze the effects of various modeling choices on performance. Experiments on a challenging business search dataset demonstrate that CD-DNN-HMMs can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs, with an absolute sentence accuracy improvement of 5.8% and 9.2% (or relative error reduction of 16.0% and 23.2%) over the CD-GMM-HMMs trained using the minimum phone error rate (MPE) and maximum-likelihood (ML) criteria, respectively.

3,120 citations

Book
09 Feb 2012
TL;DR: A new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.
Abstract: Recurrent neural networks are powerful sequence learners. They are able to incorporate context information in a flexible way, and are robust to localised distortions of the input data. These properties make them well suited to sequence labelling, where input sequences are transcribed with streams of labels. The aim of this thesis is to advance the state-of-the-art in supervised sequence labelling with recurrent networks. Its two main contributions are (1) a new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and (2) an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.

2,101 citations

01 Jan 1979
TL;DR: This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis and addressing interesting real-world computer Vision and multimedia applications.
Abstract: In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to: • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information. Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

1,758 citations

Proceedings ArticleDOI
14 Jun 2018
TL;DR: In this article, a large-scale audio-visual speaker recognition dataset, VoxCeleb2, is presented, which contains over a million utterances from over 6,000 speakers.
Abstract: The objective of this paper is speaker recognition under noisy and unconstrained conditions. We make two key contributions. First, we introduce a very large-scale audio-visual speaker recognition dataset collected from open-source media. Using a fully automated pipeline, we curate VoxCeleb2 which contains over a million utterances from over 6,000 speakers. This is several times larger than any publicly available speaker recognition dataset. Second, we develop and compare Convolutional Neural Network (CNN) models and training strategies that can effectively recognise identities from voice under various conditions. The models trained on the VoxCeleb2 dataset surpass the performance of previous works on a benchmark dataset by a significant margin.

1,289 citations

Journal ArticleDOI
TL;DR: This exhaustive literature review provides a concrete definition of Industry 4.0 and defines its six design principles such as interoperability, virtualization, local, real-time talent, service orientation and modularity.
Abstract: Manufacturing industry profoundly impact economic and societal progress. As being a commonly accepted term for research centers and universities, the Industry 4.0 initiative has received a splendid attention of the business and research community. Although the idea is not new and was on the agenda of academic research in many years with different perceptions, the term “Industry 4.0” is just launched and well accepted to some extend not only in academic life but also in the industrial society as well. While academic research focuses on understanding and defining the concept and trying to develop related systems, business models and respective methodologies, industry, on the other hand, focuses its attention on the change of industrial machine suits and intelligent products as well as potential customers on this progress. It is therefore important for the companies to primarily understand the features and content of the Industry 4.0 for potential transformation from machine dominant manufacturing to digital manufacturing. In order to achieve a successful transformation, they should clearly review their positions and respective potentials against basic requirements set forward for Industry 4.0 standard. This will allow them to generate a well-defined road map. There has been several approaches and discussions going on along this line, a several road maps are already proposed. Some of those are reviewed in this paper. However, the literature clearly indicates the lack of respective assessment methodologies. Since the implementation and applications of related theorems and definitions outlined for the 4th industrial revolution is not mature enough for most of the reel life implementations, a systematic approach for making respective assessments and evaluations seems to be urgently required for those who are intending to speed this transformation up. It is now main responsibility of the research community to developed technological infrastructure with physical systems, management models, business models as well as some well-defined Industry 4.0 scenarios in order to make the life for the practitioners easy. It is estimated by the experts that the Industry 4.0 and related progress along this line will have an enormous effect on social life. As outlined in the introduction, some social transformation is also expected. It is assumed that the robots will be more dominant in manufacturing, implanted technologies, cooperating and coordinating machines, self-decision-making systems, autonom problem solvers, learning machines, 3D printing etc. will dominate the production process. Wearable internet, big data analysis, sensor based life, smart city implementations or similar applications will be the main concern of the community. This social transformation will naturally trigger the manufacturing society to improve their manufacturing suits to cope with the customer requirements and sustain competitive advantage. A summary of the potential progress along this line is reviewed in introduction of the paper. It is so obvious that the future manufacturing systems will have a different vision composed of products, intelligence, communications and information network. This will bring about new business models to be dominant in industrial life. Another important issue to take into account is that the time span of this so-called revolution will be so short triggering a continues transformation process to yield some new industrial areas to emerge. This clearly puts a big pressure on manufacturers to learn, understand, design and implement the transformation process. Since the main motivation for finding the best way to follow this transformation, a comprehensive literature review will generate a remarkable support. This paper presents such a review for highlighting the progress and aims to help improve the awareness on the best experiences. It is intended to provide a clear idea for those wishing to generate a road map for digitizing the respective manufacturing suits. By presenting this review it is also intended to provide a hands-on library of Industry 4.0 to both academics as well as industrial practitioners. The top 100 headings, abstracts and key words (i.e. a total of 619 publications of any kind) for each search term were independently analyzed in order to ensure the reliability of the review process. Note that, this exhaustive literature review provides a concrete definition of Industry 4.0 and defines its six design principles such as interoperability, virtualization, local, real-time talent, service orientation and modularity. It seems that these principles have taken the attention of the scientists to carry out more variety of research on the subject and to develop implementable and appropriate scenarios. A comprehensive taxonomy of Industry 4.0 can also be developed through analyzing the results of this review.

1,011 citations