scispace - formally typeset
Search or ask a question
Author

Patrick Paroubek

Bio: Patrick Paroubek is an academic researcher from Centre national de la recherche scientifique. The author has contributed to research in topics: Parsing & Sentiment analysis. The author has an hindex of 18, co-authored 80 publications receiving 3454 citations. Previous affiliations of Patrick Paroubek include University of Paris & University of Nantes.


Papers
More filters
Proceedings Article
01 May 2010
TL;DR: This paper shows how to automatically collect a corpus for sentiment analysis and opinion mining purposes and builds a sentiment classifier, that is able to determine positive, negative and neutral sentiments for a document.
Abstract: Microblogging today has become a very popular communication tool among Internet users. Millions of users share opinions on different aspects of life everyday. Therefore microblogging web-sites are rich sources of data for opinion mining and sentiment analysis. Because microblogging has appeared relatively recently, there are a few research works that were devoted to this topic. In our paper, we focus on using Twitter, the most popular microblogging platform, for the task of sentiment analysis. We show how to automatically collect a corpus for sentiment analysis and opinion mining purposes. We perform linguistic analysis of the collected corpus and explain discovered phenomena. Using the corpus, we build a sentiment classifier, that is able to determine positive, negative and neutral sentiments for a document. Experimental evaluations show that our proposed techniques are efficient and performs better than previously proposed methods. In our research, we worked with English, however, the proposed technique can be used with any other language.

2,570 citations

Proceedings Article
15 Jul 2010
TL;DR: This system uses text messages from Twitter, a popular microblogging platform, for building a dataset of emotional texts and classifies the meaning of adjectives into positive or negative sentiment polarity according to the given context.
Abstract: In this paper, we describe our system which participated in the SemEval 2010 task of disambiguating sentiment ambiguous adjectives for Chinese Our system uses text messages from Twitter, a popular microblogging platform, for building a dataset of emotional texts Using the built dataset, the system classifies the meaning of adjectives into positive or negative sentiment polarity according to the given context Our approach is fully automatic It does not require any additional hand-built language resources and it is language independent

98 citations

Proceedings Article
01 May 2004
TL;DR: This paper will present and report on the progress of the EVALDA/MEDIA project, focusing on the recording and annotating protocol of the reference dialogue corpus, to design and test an evaluation methodology to compare and diagnose the context-dependent and independent understanding capability of spoken language dialogue systems.
Abstract: The aim of the MEDIA project is to design and test a methodology for the evaluat ion of context-dependent and independent spoken dialogue systems. We propose an evaluation paradigm based on the use of test suites from real-world corpora and a common semantic representation and common metrics. This paradigm should allow us to diagnose the context-sensitive understanding capability of dialogue system s. This paradigm will be used within an evaluation campaign involving several si tes all of which will carry out the task of querying information from a database .

54 citations

Proceedings Article
01 Jan 1998
TL;DR: The GRACE evaluation program aims at applying the Evaluation Paradigm to the evaluation of Part-of-Speech taggers for French and its four main components (corpus building, tagging procedure, lexicon building, evaluation procedure), as well as its internal organization.
Abstract: The GRACE evaluation program aims at applying the Evaluation Paradigm to the evaluation of Part-of-Speech taggers for French. An interesting by-product of GRACE is the production of validated language resources necessary for the evaluation. After a brief recall of the origins and the nature of the Evaluation Paradigm, we show how it relates to other national and international initiatives. We then present the now ending GRACE evaluation campaign and describe its four main components (corpus building, tagging procedure, lexicon building, evaluation procedure), as well as its internal organization. 1. The Evaluation Paradigm The Evaluation Paradigm has been proposed as a mean to foster development in research and technology in the field of language engineering. Up to now, it has been mostly used in the United States in the framework of the ARPA and NIST projects on automatic processing of spoken and written language. The paradigm is based on a two step process: first, create textual or voice data in the form of raw corpora, tagged corpora or lexica, which are then distributed to main actors in the field of language engineering for the realization of natural language processing tools. These tools address problems like disambiguation, natural language database query, message understanding, automatic translation, dictation, dialog, character recognition, etc.; then, test and compare systems on similar data. The results of the tests and the discussions (within specific workshops, for example) triggered by their publication and comparison constitute a good basis for the evaluation of the pros and cons of the various methods. The resulting synergy is a dynamizing factor for the field of Language Engineering. The Linguistic Data Consortium (whose function is to collect language related data and to organize their distribution) is a typical illustration of positive consequences of programs implementing the Evaluation Paradigm. The GRACE evaluation program is meant to be an implementation of the Evaluation Paradigm in the field of morpho-syntactic tagging. As such it corresponds to an evaluation campaign of Part-Of-Speech taggers for French organized within an automated quantitative black-box evaluation framework. 2. The GRACE evaluation program Started upon the initiative of Joseph Mariani (LIMSI) and Robert Martin (INaLF), GRACE (Grammars and Resources for Analyzers of Corpora and their Evaluation) was part of the French program: “Cognition, Communication Intelligente et Ingénierie des Langues” (Cognition, Intelligent Communication and Language Engineering), jointly promoted by the Engineering Sciences and Human Sciences departments of the CNRS (National Center for Scientific Research). The GRACE program was intended to run over a four year period (1994-1997) and was planed in two phases: a first phase dedicated to Part-of-Speech taggers, and a second phase concerned with work on syntactic analyzers, but which has been abandoned. The first year was devoted to the setting up of a coordination committee in charge of running the project and a reflection committee. The responsibility of the reflection committee formed with a pannel of experts from various domains, is to define the evaluation protocol, to specify the reference tagset and lexicon, to decide which data will be made available to the participants, and to organize the workshop for the presentation of the final results. The third entity of the GRACE organization regroups all the participants. They come both from public institutions and industrial corporations. Only participants with fully operational systems were allowed to take part in the evaluation. Furthermore, only the participants which have agreed to describe how their system works (at least during a workshop whose attendance would be restricted to the sole participants) were authorized to take part in the workshop concluding the evaluation campaign. 20 participants, from both academia and industry, registered at the beginning of the project. During the project, this number slightly decreased. 17 took part in the dry-run and 13 in the final test the results of which will be published at the beginning of fall ’98. 3. Defining the Evaluation Procedure For the definition and the organization of the GRACE evaluation campaign, we build upon the work done in previous evaluation programs, in particular the evaluation campaigns which have been conducted in the United States, especially in the scope of ARPA Human Language Technology program. Namely: the MUC (MUC-1, MUC-2, MUC-3 ( Sundheim 1991), MUC-4 ( MUC 1992)) conferences, aiming at the evaluation of message understanding systems ; TIPSTER, concerning the evaluation of automated information extraction systems from raw text data; the TREC ( Harman 1993; Harman 1994) conferences, concerning the evaluation Information Retrieval systems operating on textual databases; ParsEval and SemEval, which find their origin in Ezra Black’s work ( Black 1991; Black 1993; Black 1994) on the evaluation of syntactic analyzers done within the scope of an ACL working group. GRACE also looked at the “Morpholympics” competition ( Hauser 1994a; Hauser 1994b), which was organized in spring 1994 at Erlangen University in Germany for the evaluation of German morphological analyzers. MUC and TREC use task oriented black-box evaluation schemes requiring no knowledge of the internal processes or theoretical underpinning of the systems being tested, while ParsEval and SemEval (some of which will be part of MUC-6) are approaches which attempt to evaluate systems at the module level by using a benchmark method based on a reference corpus annotated with syntactic structures agreed upon by a panel of experts. An additional list of evaluation methods for linguistic software (lingware) now in use in the industry was found in Marc Cavazza’s report (in French) for the French Ministry of Education and Research ( Cavazza 1994). An other extensive overview of evaluation programs for Natural Language Processing systems is provided in ( Sparck Jones and Galliers 1996). Similarly to the evaluation campaigns organized in the United-States, GRACE was divided into four phases: 1. training phase (“phase d’entraı̂nement”): distribution of the training data (the training corpus) to the participants for the initial set up of their systems; 2. dry-run phase (“phase d’essais”): distribution of a first set of data (the dry-run corpus) to the participants for a first real-size test of the evaluation protocol; the task used in the MUC evaluation campagns was for the systems to fill in predefined frames from texts relating US Navy manœuvres (MUC-1 and MUC-2) or terrorism acts (MUC-3 and MUC-4) 3. test phase (“phase de test”): distributionof the “real” evaluation data (the test corpus) to the participants and realization of the evaluation; 4. adjudication phase (“phase d’adjudication”): validation with the participants of the results of the evaluation; this phase leads to the organization of a workshop where all the participants present their methods and their systems and discuss the results of the evaluation. According to the task-oriented approach chosen in GRACE, the evaluation procedure was based on a automated comparison, on a common corpus of literary and journalistic texts, of the PoS produced by various tagging systems against PoS manually validated by an expert (tagging is therefore the task selected for evaluation). In addition, as the evaluation procedure has to be applicable for the simultaneous evaluation of several systems (that may very well use various tagging procedures –statistical, rule-based, ...), the definition of the evaluation metrics cannot rely on any presupposition about the internal characteristics of the tagging methods. It has therefore to be exclusively defined in terms of the outputs produced by the systems (pure “black-box” approach), which, in the case of tagging, can be minimally represented as sequences of pairs the elements of which are the word token and its tag (or tag list). Such an output is considered to be “minimal” for the tagging task because several taggers also produce some additional information in addition to the simple tags (e.g. lemmas). In GRACE, we decided to not take into account such additional information (for example, no evaluation of the eventual lemmatization provided by the systems was performed) and restrict ourselves to the tagging task, defined as aiming at associating one unique tag to each token (and not for instance a partially disambiguiated list of tags, which would have required a much more complex metrics for comparing the systems). However, even with such a minimalistic definition for the tagging task, the actual definition of a working evaluation metrics did require from the GRACE steering committee to take several decision about various crucial issues: how to compare systems that do not operate on the same tokens (i.e. use different segmentation procedures)? How to take into account the processing of compound forms? how to compare systems that do not use the same tagsets ? how to weighten in the evaluation the different components that build up any realistic tagging system? In particular, how to evaluate the influence of the capacity of a tagger to handle unknown words ? How to evaluate the influence of the quality of the lexical information available? Build upon the evaluation scheme initially proposed by Martin Rajman in (Adda et al.(1995)) and then adapted and extended by the GRACE committees, the evaluation procedure used in GRACE is characterized by the following aspects: Dealing with varying tokenizations The problem of the differences in the variations of text segmentation between the hand-tagged reference material and the text returned by the participants is a central issue for tagger evaluation. Indeed, Not to leave a complete freedom to the participant about the tokenizing algorithm (and the lexicon) used to segment the data, they had to

50 citations

Proceedings Article
01 May 2006
TL;DR: The protocol of EASY the evaluation campaign for syntactic parsers of French in the EVALDA project of the TECHNOLANGUE program is presented and the results obtained by one participant on half of the corpus are illustrated.
Abstract: This paper presents the protocol of EASY the evaluation campaign for syntactic parsers of French in the EVALDA project of the TECHNOLANGUE program. We describe the participants, the corpus and its genre partitioning, the annotation scheme, which allows for the annotation of both constituents and relations, the evaluation methodology and, as an illustration, the results obtained by one participant on half of the corpus.

49 citations


Cited by
More filters
01 Jan 2009

7,241 citations

01 Jan 2002
TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we flnd that standard machine learning techniques deflnitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classiflcation, and support vector machines) do not perform as well on sentiment classiflcation as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classiflcation problem more challenging.

6,980 citations

Journal ArticleDOI
TL;DR: This work investigates whether measurements of collective mood states derived from large-scale Twitter feeds are correlated to the value of the Dow Jones Industrial Average (DJIA) over time and indicates that the accuracy of DJIA predictions can be significantly improved by the inclusion of specific public mood dimensions but not others.

4,453 citations

Journal ArticleDOI
01 Jun 1959

3,442 citations