scispace - formally typeset
Search or ask a question

Showing papers by "Patrick Paroubek published in 2010"


Proceedings Article
01 May 2010
TL;DR: This paper shows how to automatically collect a corpus for sentiment analysis and opinion mining purposes and builds a sentiment classifier, that is able to determine positive, negative and neutral sentiments for a document.
Abstract: Microblogging today has become a very popular communication tool among Internet users. Millions of users share opinions on different aspects of life everyday. Therefore microblogging web-sites are rich sources of data for opinion mining and sentiment analysis. Because microblogging has appeared relatively recently, there are a few research works that were devoted to this topic. In our paper, we focus on using Twitter, the most popular microblogging platform, for the task of sentiment analysis. We show how to automatically collect a corpus for sentiment analysis and opinion mining purposes. We perform linguistic analysis of the collected corpus and explain discovered phenomena. Using the corpus, we build a sentiment classifier, that is able to determine positive, negative and neutral sentiments for a document. Experimental evaluations show that our proposed techniques are efficient and performs better than previously proposed methods. In our research, we worked with English, however, the proposed technique can be used with any other language.

2,570 citations


Proceedings Article
15 Jul 2010
TL;DR: This system uses text messages from Twitter, a popular microblogging platform, for building a dataset of emotional texts and classifies the meaning of adjectives into positive or negative sentiment polarity according to the given context.
Abstract: In this paper, we describe our system which participated in the SemEval 2010 task of disambiguating sentiment ambiguous adjectives for Chinese Our system uses text messages from Twitter, a popular microblogging platform, for building a dataset of emotional texts Using the built dataset, the system classifies the meaning of adjectives into positive or negative sentiment polarity according to the given context Our approach is fully automatic It does not require any additional hand-built language resources and it is language independent

98 citations


Proceedings Article
01 May 2010
TL;DR: The two level opinion and sentiment model that will be used for evaluation in the DOXA project and the annotation interface the authors use for hand annotating a reference corpus are presented.
Abstract: After presenting opinion and sentiment analysis state of the art and the DOXA project, we review the few evaluation campaigns that have dealt in the past with opinion mining. Then we present the two level opinion and sentiment model that we will use for evaluation in the DOXA project and the annotation interface we use for hand annotating a reference corpus. We then present the corpus which will be used on DOXA and report on the hand-annotation task on a corpus of comments on video games and the solution adopted to obtain a sufficient level of inter-annotator agreement.

19 citations


Proceedings Article
01 May 2010
TL;DR: This paper describes the XML format chosen for PASSAGE and shows that it is compliant with the latest propositions in terms of linguistic annotation standard and discusses the influence that corpus-based evaluation has on the characteristics of syntactic representation when willing to assess the performance of any kind of parser.
Abstract: The current PASSAGE syntactic representation is the result of 9 years of constant evolution with the aim of providing a common ground for evaluating parsers of French whatever their type and supporting theory. In this paper we present the latest developments concerning the formalism and show first through a review of basic linguistic phenomena that it is a plausible minimal common ground for representing French syntax in the context of generic black box quantitative objective evaluation. For the phenomena reviewed, which include: the notion of syntactic head, apposition, control and coordination, we explain how PASSAGE representation relates to other syntactic representation schemes for French and English, slightly extending the annotation to address English when needed. Second, we describe the XML format chosen for PASSAGE and show that it is compliant with the latest propositions in terms of linguistic annotation standard. We conclude discussing the influence that corpus-based evaluation has on the characteristics of syntactic representation when willing to assess the performance of any kind of parser.

16 citations