scispace - formally typeset
Search or ask a question

Showing papers by "Patrick Paroubek published in 2005"


10 Sep 2005
TL;DR: The reported study aims at increasing the understanding of spontaneous speech-related phenomena from sibling corpora of speech and orthographic transcriptions at various levels of elaboration, using 9 hours of French broadcast interview archives involving 10 journalists and 10 personalities from political or civil society.
Abstract: The reported study aims at increasing our understanding of spontaneous speech-related phenomena from sibling corpora of speech and orthographic transcriptions at various levels of elaboration. It makes use of 9 hours of French broadcast interview archives, involving 10 journalists and 10 personalities from political or civil society. First we considered press-oriented transcripts, where most of the socalled disfluencies are discarded. They were then aligned with automatic transcripts, by using the LIMSI speech recogniser. This facilitated the production of exact transcripts, where all audible phenomena in non-overlapping speech segments were transcribed manually. Four types of disfluencies were distinguished: discourse markers, filled pauses, repetitions and revisions, each of which accounts for about 2% of the corpus (8% in total). They were analysed by utterance”, speaker and disfluency pattern types. Four question were raised. Where do disfluencies occur in the utterance? What is the influence of the speakers’ status? And what are the most frequent disfuency patterns?

28 citations



DOI
09 Mar 2005
TL;DR: Les systemes de question-reponse developpes actuellement adoptent pour la plupart et a peu de chose pres le meme type d'architecture que l'on peut schematiser en trois modules est donc important d'evaluer l'apport de ces outils ou bases de connaissances.
Abstract: Les systemes de question-reponse developpes actuellement adoptent pour la plupart et a peu de chose pres le meme type d'architecture que l'on peut schematiser en trois modules : l'analyse de la question, la selection des documents, l'extraction de la reponse. Mais ce en quoi ils different, ce sont les outils (moteur d'indexation, analyseurs...) et les bases de connaissances qu'ils utilisent. Pour chacun de ces systemes, il est donc important d'evaluer l'apport de ces outils ou bases de connaissances. Dans le cadre de la campagne Equer (campagne d'evaluation des systemes de question-reponse pour le francais), notre systeme FRASQUES a produit deux jeux de resultats : l'un utilise des synonymes dans les bi-termes, l'autre pour les mono-termes aussi. La comparaison de ces deux tests et l'etude d'un corpus plus large, en francais et en anglais, permet de mesurer l'apport de ces connaissances semantiques.

1 citations