scispace - formally typeset
P

Patrick Paroubek

Researcher at Centre national de la recherche scientifique

Publications -  88
Citations -  3629

Patrick Paroubek is an academic researcher from Centre national de la recherche scientifique. The author has contributed to research in topics: Parsing & Sentiment analysis. The author has an hindex of 18, co-authored 80 publications receiving 3454 citations. Previous affiliations of Patrick Paroubek include University of Paris & University of Nantes.

Papers
More filters
Proceedings Article

A Protocol for Evaluating Analyzers of Syntax (PEAS)

TL;DR: This paper presents PEAS: a Protocol for Evaluating Analyzers of Syntax (in French: Protocole d’Evaluation pour les Analyseurs Syntaxiques), based on an ongoing experiment at LIMSI which aims at developing and testing a generic quantitative black-box evaluation protocol for parsers of French.
Proceedings Article

Automatic Audio and Manual Transcripts Alignment, Time-code Transfer and Selection of Exact Transcripts

TL;DR: This study makes use of 10 hours of French radio interview archives with corresponding press-oriented transcripts to generate automatic transcripts of sibling resources of audio and written documents, such as available in audio archives or for parliament debates.

The Multilingual Anonymisation Toolkit for Public Administrations (MAPA) Project

TL;DR: The MAPA project, funded under the Connecting Europe Facility programme, is described, whose goal is the development of an open-source de-identification toolkit for all official European Union languages.
Proceedings Article

NLP Analytics in Finance with DoRe: A French 250M Tokens Corpus of Corporate Annual Reports.

TL;DR: The construction of the DoRe corpus is related, which is designed to be as modular as possible in order to allow for maximum reuse in different tasks pertaining to Economics, Finance and Regulation, and on the spectrum of possible uses of this new resource for NLP applications.
Proceedings ArticleDOI

Rediscovering 50 years of discoveries in speech and language processing: A survey

TL;DR: The NLP4NLP corpus is created to study the content of scientific publications in the field of speech and natural language processing, comprising 65,000 documents, gathering 50,000 authors, including 325,000 references and representing approximately 270 million words.