Showing papers by "Lori Lamel published in 2010"

PDF

Open Access

Proceedings Article•

Automatic Speech Recognition of Multiple Accented English Data

[...]

Dimitra Vergyri¹, Lori Lamel, Jean-Luc Gauvain•Institutions (1)

01 Jan 2010

TL;DR: There is significant performance degradation of a baseline system trained on only US data when confronted with shows from other regions, but results improve significantly when data from all the regions are included for accent-independent acoustic model training.

...read moreread less

Abstract: Accent variability is an important factor in speech that can significantly degrade automatic speech recognition performance. We investigate the effect of multiple accents on an English broadcast news recognition system. A multi-accented English corpus is used for the task, including broadcast news segments from 6 different geographic regions: US, Great Britain, Australia, North Africa, Middle East and India. There is significant performance degradation of a baseline system trained on only US data when confronted with shows from other regions. The results improve significantly when data from all the regions are included for accent-independent acoustic model training. Further improvements are achieved when MAP-adapted accentdependent models are used in conjunction with a GMM accent classifier.

...read moreread less

56 citations

Development of a speech-to-text transcription system for Finnish.

[...]

Lori Lamel, Bianca Vieru

01 Jan 2010

TL;DR: This paper describes the development of a speech-to-text transcription system for the Finnish language, carried out without any detailed manual transcriptions, relying instead on several sources of audio and textual data found on the web.

...read moreread less

Abstract: This paper describes the development of a speech-to-text transcription system for the Finnish language. Finnish is a Finno-Ugric language spoken by about 6 million of people living in Finland, but also by some minorities in Sweden, Norway, Russia and Estonia. System development was carried out without any detailed manual transcriptions, relying instead on several sources of audio and textual data were found on the web. Some of the audio sources were associated with approximate (and usually partial) texts, which were used to provide estimates of system performance.

...read moreread less

18 citations

Book Chapter•DOI•

Comparing SMT methods for automatic generation of pronunciation variants

[...]

Panagiota Karanasou¹, Lori Lamel¹•Institutions (1)

Centre national de la recherche scientifique¹

16 Aug 2010

TL;DR: Two methods based on statistical machine translation (SMT) are used to generate multiple pronunciations from the canonical pronunciation of a word using phoneme-to-phoneme (p2p) conversion and derive variants from a given canonical pronunciation.

...read moreread less

Abstract: Multiple-pronunciation dictionaries are often used by automatic speech recognition systems in order to account for different speaking styles. In this paper, two methods based on statistical machine translation (SMT) are used to generate multiple pronunciations from the canonical pronunciation of a word. In the first method, a machine translation tool is used to perform phoneme-to-phoneme (p2p) conversion and derive variants from a given canonical pronunciation. The second method is based on a pivot method proposed for the paraphrase extraction task. The two methods are compared under different training conditions which allow single or multiple pronunciations in the training set, and their performance is evaluated in terms of recall and precision measures.

...read moreread less

17 citations

Proceedings Article•DOI•

Multi-style MLP features for BN transcription

[...]

Viet Bac Le¹, Lori Lamel¹, Jean-Luc Gauvain¹•Institutions (1)

Centre national de la recherche scientifique¹

14 Mar 2010

TL;DR: This paper explores three approaches (adaptation, full training, and feature merging) to use condition-specific MLP features in a state-of-the-art BN STT system for French and finds the third approach without condition- Specific adaptation was found to outperform the original models with condition- specific adaptation.

...read moreread less

Abstract: It has become common practice to adapt acoustic models to specific-conditions (gender, accent, bandwidth) in order to improve the performance of speech-to-text (STT) transcription systems. With the growing interest in the use of discriminative features produced by a multi layer perceptron (MLP) in such systems, the question arise of whether it is necessary to specialize the MLP to particular conditions, and if so, how to incorporate the condition-specific MLP features in the system. This paper explores three approaches (adaptation, full training, and feature merging) to use condition-specific MLP features in a state-of-the-art BN STT system for French. The third approach without condition-specific adaptation was found to outperform the original models with condition-specific adaptation, and was found to perform almost as well as full training of multiple condition-specific HMMs.

...read moreread less

11 citations

Proceedings Article•

Improved n-gram phonotactic models for language recognition.

[...]

Mohamed Faouzi BenZeghiba¹, Jean-Luc Gauvain¹, Lori Lamel¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Jan 2010

TL;DR: The impact of the so-called acoustic scale factor on the system accuracy when using latticebased training, and the use of n-gram cutoff and entropy pruning techniques are reported on.

...read moreread less

Abstract: This paper investigates various techniques to improve the estimation of n-gram phonotactic models for language recognition using single-best phone transcriptions and phone lattices. More precisely, we first report on the impact of the so-called acoustic scale factor on the system accuracy when using latticebased training, and then we report on the use of n-gram cutoff and entropy pruning techniques. Several system configurations are explored, such as the use of context-independent and context-dependent phone models, the use of single-best phone hypotheses versus phone lattices, and the use of various n-gram orders. Experiments are conducted using the LRE 2007 evaluation data and the results are reported using the a posteriori EER. The results show that the impact of these techniques on the system accuracy is highly dependent on the training conditions and that careful optimization can lead to performance improvements.

...read moreread less

10 citations

Initializing acoustic phone models of under-resourced languages: a case-study of Luxembourgish.

[...]

Martine Adda-Decker, Lori Lamel, Natalie D. Snoeren

01 Jan 2010

TL;DR: This work reports on the ongoing work to take Luxembourgish on board as an e-language : an electronically searchable spoken language and focuses on the issue of producing acoustic seed models for Luxembourgish.

...read moreread less

Abstract: The national language of the Grand-Duchy of Luxembourg, Luxembourgish, has often been characterized as one of Europe’s under-described and under-resourced languages. In this contribution we report on our ongoing work to take Luxembourgish on board as an e-language : an electronically searchable spoken language. More specifically, we focus on the issue of producing acoustic seed models for Luxembourgish. A phonemic inventory was defined and linked to inventories from major neighboring languages (German, French and English), with the help of the IPA symbol set. Acoustic seed model sets were composed using monolingual German, French or English acoustic model sets and corresponding forced alignment segmentations were compared. Next a super-set of multilingual acoustic seeds was used putting together the three language-dependent sets. The language-identity of the aligned acoustic models provides information about the overall acoustic adequacy of both the cross-language phonemic correspondances and the acoustic models. Furthermore some information can be gleaned on inter-language distances : the German acoustic models provided the best match with 54.3% of the segments aligned using German seeds, 35.3% using the English ones and only 10.4% using the French acoustic models. Since Luxembourgish is considered a Western Germanic language close to German, this result is in line with its linguistic typology.

...read moreread less

5 citations

Proceedings Article•DOI•

Improving Mandarin Chinese STT system with Random Forests language models

[...]

Ilya Oparin¹, Lori Lamel¹, Jean-Luc Gauvain¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Nov 2010

TL;DR: The goal of this work is to assess the capacity of random forest language models estimated on a very large text corpus to improve the performance of an STT system and introduced a Forest of Random Forests language modeling scheme.

...read moreread less

Abstract: The goal of this work is to assess the capacity of random forest language models estimated on a very large text corpus to improve the performance of an STT system. Previous experiments with random forests were mainly concerned with small or medium size data tasks. In this work the development version of the 2009 LIMSI Mandarin Chinese STT system was chosen as a challenging baseline to improve upon. This system is characterized by a language model trained on a very large text corpus (over 3.2 billion segmented words) making the baseline 4-gram estimates particularly robust. We observed moderate perplexity and CER improvements when this model is interpolated with a random forest language model. In order to attain the goal we tried different strategies to build random forests on the available data and introduced a Forest of Random Forests language modeling scheme. However, the improvements we get for large data over a well-tuned baseline N-gram model are less impressive than those reported for smaller data tasks.

...read moreread less

3 citations

Proceedings Article•

Evaluation Protocol and Tools for Question-Answering on Speech Transcripts.

[...]

Nicolas Moreau, Olivier Hamon¹, Djamel Mostefa, Sophie Rosset², Olivier Galibert², Lori Lamel², Jordi Turmo³, Pere R. Comas³, Paolo Rosso⁴, Davide Buscaldi⁴, Khalid Choukri - Show less +7 more•Institutions (4)

University of Paris¹, Centre national de la recherche scientifique², Polytechnic University of Catalonia³, Polytechnic University of Valencia⁴

01 May 2010

TL;DR: A methodology for a semi-automatic evaluation of QAST systems based on time slot comparisons is introduced and the QAST Evaluation Package 2007-2009 resulting from these evaluation campaigns is introduced.

...read moreread less

Abstract: Question Answering (QA) technology aims at providing relevant answers to natural language questions Most Question Answering research has focused on mining document collections containing written texts to answer written questions In addition to written sources, a large (and growing) amount of potentially interesting information appears in spoken documents, such as broadcast news, speeches, seminars, meetings or telephone conversations The QAST track (Question-Answering on Speech Transcripts) was introduced in CLEF to investigate the problem of question answering in such audio documents This paper describes in detail the evaluation protocol and tools designed and developed for the CLEF-QAST evaluation campaigns that have taken place between 2007 and 2009 We first remind the data, question sets, and submission procedures that were produced or set up during these three campaigns As for the evaluation procedure, the interface that was developed to ease the assessors work is described In addition, this paper introduces a methodology for a semi-automatic evaluation of QAST systems based on time slot comparisons Finally, the QAST Evaluation Package 2007-2009 resulting from these evaluation campaigns is also introduced

...read moreread less

3 citations

Proceedings Article•

Comparing mono- & multilingual acoustic seed models for a low e-resourced language: a case-study of Luxembourgish

[...]

Martine Adda-Decker, Lori Lamel, Natalie D. Snoeren

01 Jan 2010

TL;DR: Although speech in Luxembourgish is frequently interspersed with French words, forced alignments on these data showed a clear preference for Germanic acoustic models with only a limited usage of French.

...read moreread less

Abstract: Luxembourgish is embedded in a multilingual context on the divide between Romance and Germanic cultures and has often been viewed as one of Europe’s under-resourced languages. We focus on the acoustic modeling of Luxembourgish. By taking advantage of monolingual acoustic seeds selected from German, French or English model sets via IPA symbol correspondances, we investigated whether Luxembourgish spoken words were globally better represented by one of these languages. Although speech in Luxembourgish is frequently interspersed with French words, forced alignments on these data showed a clear preference for Germanic acoustic models with only a limited usage of French. German models provided the best match with 54% of the data, 35% for English and only 11% for French models. A set of multilingual acoustic models, estimated the pooled German, French, and English audio data, captured 27% to 48% of the data depending on conditions. Index Terms: multilingual alignment, acoustic seed models, under-resourced languages, Luxembourgish, English, French, German.

...read moreread less

2 citations

Book Chapter•DOI•

Large-scale language modeling with random forests for mandarin Chinese speech-to-text

[...]

Ilya Oparin¹, Lori Lamel¹, Jean-Luc Gauvain¹•Institutions (1)

Centre national de la recherche scientifique¹

16 Aug 2010

TL;DR: In this work the random forest language modeling approach is applied with the aim of improving the performance of the LIMSI, highly competitive, Mandarin Chinese speech-to-text system and Forest of Random Forest language modeling scheme is introduced.

...read moreread less

Abstract: In this work the random forest language modeling approach is applied with the aim of improving the performance of the LIMSI, highly competitive, Mandarin Chinese speech-to-text system. The experimental setup is that of the GALE Phase 4 evaluation. This setup is characterized by a large amount of available language model training data (over 3.2 billion segmented words). A conventional unpruned 4-gram language model with a vocabulary of 56K words serves as a baseline that is challenging to improve upon. However moderate perplexity and CER improvements over this model were obtained with a random forest language model. Different random forest training strategies were explored so as to attain the maximal gain in performance and Forest of Random Forest language modeling scheme is introduced.

...read moreread less

2 citations