scispace - formally typeset
Search or ask a question

Showing papers by "Patrick Paroubek published in 2004"


Proceedings Article
01 May 2004
TL;DR: This paper will present and report on the progress of the EVALDA/MEDIA project, focusing on the recording and annotating protocol of the reference dialogue corpus, to design and test an evaluation methodology to compare and diagnose the context-dependent and independent understanding capability of spoken language dialogue systems.
Abstract: The aim of the MEDIA project is to design and test a methodology for the evaluat ion of context-dependent and independent spoken dialogue systems. We propose an evaluation paradigm based on the use of test suites from real-world corpora and a common semantic representation and common metrics. This paradigm should allow us to diagnose the context-sensitive understanding capability of dialogue system s. This paradigm will be used within an evaluation campaign involving several si tes all of which will carry out the task of querying information from a database .

54 citations


Proceedings Article
01 May 2004
TL;DR: This paper presents EASY (Evaluation of Analyzers of SYntax), an ongoing evaluation campaign of syntactic parsing of French, a subproject of EVALDA in the French TECHNOLANGUE program.
Abstract: This paper presents EASY (Evaluation of Analyzers of SYntax), an ongoing evaluation campaign of syntactic parsing of French, a subproject of EVALDA in the French TECHNOLANGUE program. After presenting the elaboration of the annotation formalism, we describe the corpus building steps, the annotation tools, the evaluation measures and finally, plans to produce a validated large linguistic resource, syntactically annotated

14 citations


01 Jan 2004
TL;DR: The EVALDA/MEDIA project as discussed by the authors evaluated spoken language dialogue systems from both academic organizations (CLIPS, IRIT, LIA, LIMSI, LORIA, VALORIA) and industrial sites (FRANCE TELECOM R et D, TELIP) focusing on the recording protocol of the reference dialogue corpus.
Abstract: This paper presents and reports on the progress of the EVALDA/MEDIA project, focusing on the recording protocol of the reference dialogue corpus. The aim of this project is to define and test an evaluation methodology that assess and diagnose the contextsensitive understanding capability of spoken language dialogue systems. Systems from both academic organizations (CLIPS, IRIT, LIA, LIMSI, LORIA, VALORIA) and industrial sites (FRANCE TELECOM R et D, TELIP) will be evaluated. ELDA is the coordinator of the Technolangue/EVALDA multicampaign evaluation project, a national initiative sponsored by the French government, of which MEDIA is a sub-campaign. MEDIA began in January 2003. VECSYS provides the recording platform for the project.

7 citations


Proceedings Article
01 May 2004
TL;DR: This study makes use of 10 hours of French radio interview archives with corresponding press-oriented transcripts to generate automatic transcripts of sibling resources of audio and written documents, such as available in audio archives or for parliament debates.
Abstract: The present study focuses on automatic processing of sibling resources of audio and written documents, such as available in audio archives or for parliament debates: written texts are close but not exact audio transcripts. Such resources deserve attention for several reasons: they represent an interesting testbed for studying differences between written and spoken material and they yield low cost resources for acoustic model training. When automatically transcribing the audio data, regions of agreement between automatic transcripts and written sources allow to transfer time-codes to the written documents: this may be helpful in an audio archive or audio information retrieval environment. Regions of disagreement can be automatically selected for further correction by human transcribers. This study makes use of 10 hours of French radio interview archives with corresponding press-oriented transcripts. The audio corpus has then been transcribed using the LIMSI speech recognizer resulting in automatic transcripts, exhibiting an average word error rate of 12%. 80% of the text corpus (with word chunks of at least five words) can be exactly aligned with the automatic transcripts of the audio data. The residual word error rate on these 80% is less than 1%.

4 citations


01 Apr 2004
TL;DR: L’influence de la zone de travail que possède une entité logicielle pour lui permettre de prédire l’état futur de son environnement islamique est présente.
Abstract: Une nouvelle voie de recherche prometteuse est apparue ces dernieres annees, qui proposed’utiliser les principes evolutionnistes (Kirby S., 2003) et darwinistes (Edelman G. M., 1992)pour que des communautes de machines developpent un langage. Dans cet ordre d’idees, nousnous sommes interesses aux experiences de (Steels L., 1996) qui concernent un des elementsles plus simples mais aussi des plus essentiels, propre au langage : le lexique.Notre but est d’etudier l’elaboration d’un modele des mecanismes de comprehension du lan-gage. Plus particulierement, nous essayons de degager les prerequis fonctionnels necessairespour que des entites logicielles puissent faire evoluer un langage a partir de leurs receptions(perceptions) et effections (actions), produits de leurs interactions avec leur environnement,ceci en mode non-supervise ou legerement contraint, par exemple, avec une population mixted’agents logiciels et humains (Steels L., Kaplan F., 1999).