scispace - formally typeset
Search or ask a question

Showing papers by "Lori Lamel published in 2010"


Proceedings Article
01 Jan 2010
TL;DR: There is significant performance degradation of a baseline system trained on only US data when confronted with shows from other regions, but results improve significantly when data from all the regions are included for accent-independent acoustic model training.
Abstract: Accent variability is an important factor in speech that can significantly degrade automatic speech recognition performance. We investigate the effect of multiple accents on an English broadcast news recognition system. A multi-accented English corpus is used for the task, including broadcast news segments from 6 different geographic regions: US, Great Britain, Australia, North Africa, Middle East and India. There is significant performance degradation of a baseline system trained on only US data when confronted with shows from other regions. The results improve significantly when data from all the regions are included for accent-independent acoustic model training. Further improvements are achieved when MAP-adapted accentdependent models are used in conjunction with a GMM accent classifier.

56 citations


01 Jan 2010
TL;DR: This paper describes the development of a speech-to-text transcription system for the Finnish language, carried out without any detailed manual transcriptions, relying instead on several sources of audio and textual data found on the web.
Abstract: This paper describes the development of a speech-to-text transcription system for the Finnish language. Finnish is a Finno-Ugric language spoken by about 6 million of people living in Finland, but also by some minorities in Sweden, Norway, Russia and Estonia. System development was carried out without any detailed manual transcriptions, relying instead on several sources of audio and textual data were found on the web. Some of the audio sources were associated with approximate (and usually partial) texts, which were used to provide estimates of system performance.

18 citations


Book ChapterDOI
16 Aug 2010
TL;DR: Two methods based on statistical machine translation (SMT) are used to generate multiple pronunciations from the canonical pronunciation of a word using phoneme-to-phoneme (p2p) conversion and derive variants from a given canonical pronunciation.
Abstract: Multiple-pronunciation dictionaries are often used by automatic speech recognition systems in order to account for different speaking styles. In this paper, two methods based on statistical machine translation (SMT) are used to generate multiple pronunciations from the canonical pronunciation of a word. In the first method, a machine translation tool is used to perform phoneme-to-phoneme (p2p) conversion and derive variants from a given canonical pronunciation. The second method is based on a pivot method proposed for the paraphrase extraction task. The two methods are compared under different training conditions which allow single or multiple pronunciations in the training set, and their performance is evaluated in terms of recall and precision measures.

17 citations


Proceedings ArticleDOI
14 Mar 2010
TL;DR: This paper explores three approaches (adaptation, full training, and feature merging) to use condition-specific MLP features in a state-of-the-art BN STT system for French and finds the third approach without condition- Specific adaptation was found to outperform the original models with condition- specific adaptation.
Abstract: It has become common practice to adapt acoustic models to specific-conditions (gender, accent, bandwidth) in order to improve the performance of speech-to-text (STT) transcription systems. With the growing interest in the use of discriminative features produced by a multi layer perceptron (MLP) in such systems, the question arise of whether it is necessary to specialize the MLP to particular conditions, and if so, how to incorporate the condition-specific MLP features in the system. This paper explores three approaches (adaptation, full training, and feature merging) to use condition-specific MLP features in a state-of-the-art BN STT system for French. The third approach without condition-specific adaptation was found to outperform the original models with condition-specific adaptation, and was found to perform almost as well as full training of multiple condition-specific HMMs.

11 citations


Proceedings Article
01 Jan 2010
TL;DR: The impact of the so-called acoustic scale factor on the system accuracy when using latticebased training, and the use of n-gram cutoff and entropy pruning techniques are reported on.
Abstract: This paper investigates various techniques to improve the estimation of n-gram phonotactic models for language recognition using single-best phone transcriptions and phone lattices. More precisely, we first report on the impact of the so-called acoustic scale factor on the system accuracy when using latticebased training, and then we report on the use of n-gram cutoff and entropy pruning techniques. Several system configurations are explored, such as the use of context-independent and context-dependent phone models, the use of single-best phone hypotheses versus phone lattices, and the use of various n-gram orders. Experiments are conducted using the LRE 2007 evaluation data and the results are reported using the a posteriori EER. The results show that the impact of these techniques on the system accuracy is highly dependent on the training conditions and that careful optimization can lead to performance improvements.

10 citations


01 Jan 2010
TL;DR: This work reports on the ongoing work to take Luxembourgish on board as an e-language : an electronically searchable spoken language and focuses on the issue of producing acoustic seed models for Luxembourgish.
Abstract: The national language of the Grand-Duchy of Luxembourg, Luxembourgish, has often been characterized as one of Europe’s under-described and under-resourced languages. In this contribution we report on our ongoing work to take Luxembourgish on board as an e-language : an electronically searchable spoken language. More specifically, we focus on the issue of producing acoustic seed models for Luxembourgish. A phonemic inventory was defined and linked to inventories from major neighboring languages (German, French and English), with the help of the IPA symbol set. Acoustic seed model sets were composed using monolingual German, French or English acoustic model sets and corresponding forced alignment segmentations were compared. Next a super-set of multilingual acoustic seeds was used putting together the three language-dependent sets. The language-identity of the aligned acoustic models provides information about the overall acoustic adequacy of both the cross-language phonemic correspondances and the acoustic models. Furthermore some information can be gleaned on inter-language distances : the German acoustic models provided the best match with 54.3% of the segments aligned using German seeds, 35.3% using the English ones and only 10.4% using the French acoustic models. Since Luxembourgish is considered a Western Germanic language close to German, this result is in line with its linguistic typology.

5 citations


Proceedings ArticleDOI
01 Nov 2010
TL;DR: The goal of this work is to assess the capacity of random forest language models estimated on a very large text corpus to improve the performance of an STT system and introduced a Forest of Random Forests language modeling scheme.
Abstract: The goal of this work is to assess the capacity of random forest language models estimated on a very large text corpus to improve the performance of an STT system. Previous experiments with random forests were mainly concerned with small or medium size data tasks. In this work the development version of the 2009 LIMSI Mandarin Chinese STT system was chosen as a challenging baseline to improve upon. This system is characterized by a language model trained on a very large text corpus (over 3.2 billion segmented words) making the baseline 4-gram estimates particularly robust. We observed moderate perplexity and CER improvements when this model is interpolated with a random forest language model. In order to attain the goal we tried different strategies to build random forests on the available data and introduced a Forest of Random Forests language modeling scheme. However, the improvements we get for large data over a well-tuned baseline N-gram model are less impressive than those reported for smaller data tasks.

3 citations


Proceedings Article
01 May 2010
TL;DR: A methodology for a semi-automatic evaluation of QAST systems based on time slot comparisons is introduced and the QAST Evaluation Package 2007-2009 resulting from these evaluation campaigns is introduced.
Abstract: Question Answering (QA) technology aims at providing relevant answers to natural language questions Most Question Answering research has focused on mining document collections containing written texts to answer written questions In addition to written sources, a large (and growing) amount of potentially interesting information appears in spoken documents, such as broadcast news, speeches, seminars, meetings or telephone conversations The QAST track (Question-Answering on Speech Transcripts) was introduced in CLEF to investigate the problem of question answering in such audio documents This paper describes in detail the evaluation protocol and tools designed and developed for the CLEF-QAST evaluation campaigns that have taken place between 2007 and 2009 We first remind the data, question sets, and submission procedures that were produced or set up during these three campaigns As for the evaluation procedure, the interface that was developed to ease the assessors’ work is described In addition, this paper introduces a methodology for a semi-automatic evaluation of QAST systems based on time slot comparisons Finally, the QAST Evaluation Package 2007-2009 resulting from these evaluation campaigns is also introduced

3 citations


Proceedings Article
01 Jan 2010
TL;DR: Although speech in Luxembourgish is frequently interspersed with French words, forced alignments on these data showed a clear preference for Germanic acoustic models with only a limited usage of French.
Abstract: Luxembourgish is embedded in a multilingual context on the divide between Romance and Germanic cultures and has often been viewed as one of Europe’s under-resourced languages. We focus on the acoustic modeling of Luxembourgish. By taking advantage of monolingual acoustic seeds selected from German, French or English model sets via IPA symbol correspondances, we investigated whether Luxembourgish spoken words were globally better represented by one of these languages. Although speech in Luxembourgish is frequently interspersed with French words, forced alignments on these data showed a clear preference for Germanic acoustic models with only a limited usage of French. German models provided the best match with 54% of the data, 35% for English and only 11% for French models. A set of multilingual acoustic models, estimated the pooled German, French, and English audio data, captured 27% to 48% of the data depending on conditions. Index Terms: multilingual alignment, acoustic seed models, under-resourced languages, Luxembourgish, English, French, German.

2 citations


Book ChapterDOI
16 Aug 2010
TL;DR: In this work the random forest language modeling approach is applied with the aim of improving the performance of the LIMSI, highly competitive, Mandarin Chinese speech-to-text system and Forest of Random Forest language modeling scheme is introduced.
Abstract: In this work the random forest language modeling approach is applied with the aim of improving the performance of the LIMSI, highly competitive, Mandarin Chinese speech-to-text system. The experimental setup is that of the GALE Phase 4 evaluation. This setup is characterized by a large amount of available language model training data (over 3.2 billion segmented words). A conventional unpruned 4-gram language model with a vocabulary of 56K words serves as a baseline that is challenging to improve upon. However moderate perplexity and CER improvements over this model were obtained with a random forest language model. Different random forest training strategies were explored so as to attain the maximal gain in performance and Forest of Random Forest language modeling scheme is introduced.

2 citations