scispace - formally typeset
Search or ask a question

Showing papers by "Georgios Paltoglou published in 2010"


Journal IssueDOI
TL;DR: SentiStrength as discussed by the authors is able to predict positive emotion with 60.6p accuracy and negative emotion with 72.8p accuracy, both based upon strength scales of 1-5.
Abstract: A huge number of informal messages are posted every day in social network sites, blogs, and discussion forums. Emotions seem to be frequently important in these texts for expressing friendship, showing social support or as part of online arguments. Algorithms to identify sentiment and sentiment strength are needed to help understand the role of emotion in this informal communication and also to identify inappropriate or anomalous affective utterances, potentially associated with threatening behavior to the self or others. Nevertheless, existing sentiment detection algorithms tend to be commercially oriented, designed to identify opinions about products rather than user behaviors. This article partly fills this gap with a new algorithm, SentiStrength, to extract sentiment strength from informal English text, using new methods to exploit the de facto grammars and spelling styles of cyberspace. Applied to MySpace comments and with a lookup table of term sentiment strengths optimized by machine learning, SentiStrength is able to predict positive emotion with 60.6p accuracy and negative emotion with 72.8p accuracy, both based upon strength scales of 1–5. The former, but not the latter, is better than baseline and a wide range of general machine learning approaches. © 2010 Wiley Periodicals, Inc.

1,371 citations


Proceedings Article
11 Jul 2010
TL;DR: It is shown that variants of the classic tf.idf scheme adapted to sentiment analysis provide significant increases in accuracy, especially when using a sublinear function for term frequency weights and document frequency smoothing.
Abstract: Most sentiment analysis approaches use as baseline a support vector machines (SVM) classifier with binary unigram weights. In this paper, we explore whether more sophisticated feature weighting schemes from Information Retrieval can enhance classification accuracy. We show that variants of the classic tf.idf scheme adapted to sentiment analysis provide significant increases in accuracy, especially when using a sublinear function for term frequency weights and document frequency smoothing. The techniques are tested on a wide selection of data sets and produce the best accuracy to our knowledge.

355 citations


Journal ArticleDOI
TL;DR: A systematic way to study Blog data by combined approaches of physics of complex networks and computer science methods of text analysis to identify topological communities featuring the users clustered around certain popular posts, and underly the role of emotional contents in the emergence and evolution of these communities.
Abstract: Online communications at web portals represents technology-mediated user interactions, leading to massive data and potentially new techno-social phenomena not seen in real social mixing. Apart from being dynamically driven, the user interactions via posts is indirect, suggesting the importance of the contents of the posted material. We present a systematic way to study Blog data by combined approaches of physics of complex networks and computer science methods of text analysis. We are mapping the Blog data onto a bipartite network where users and posts with comments are two natural partitions. With the machine learning methods we classify the texts of posts and comments for their emotional contents as positive or negative, or otherwise objective (neutral). Using the spectral methods of weighted bipartite graphs, we identify topological communities featuring the users clustered around certain popular posts, and underly the role of emotional contents in the emergence and evolution of these communities.

74 citations


01 Jan 2010
TL;DR: A number of approaches for detecting whether a textual utterance is of objective or subjective nature and in the latter case detecting the polarity of the utterance (i.e. positive vs. negative) are studied.
Abstract: The ability to correctly identify the existence and polarity of emotion in informal, textual communication is a very important part of a realistic and immersive 3D environment where people communicate with one another through avatars or with an automated system. Such a feature would provide the system the ability to realistically represent the mood and intentions of the participants, thus greatly enhancing their experience. In this paper, we study and compare a number of approaches for detecting whether a textual utterance is of objective or subjective nature and in the latter case detecting the polarity of the utterance (i.e. positive vs. negative). Experiments are carried out on a real corpus of social exchanges in cyberspace and general conclusions are presented.

70 citations


Journal ArticleDOI
TL;DR: This work identifies user communities centered around certain popular posts and determines emotional contents of the related comments by the emotion classifier developed for this type of text by mapping the high-resolution data from digg.com onto bipartite networks of users and their comments onto posted stories.
Abstract: Large-scale data resulting from users online interactions provide the ultimate source of information to study emergent social phenomena on the Web. From individual actions of users to observable collective behaviors, different mechanisms involving emotions expressed in the posted text play a role. Here we combine approaches of statistical physics with machine-learning methods of text analysis to study emergence of the emotional behavior among Web users. Mapping the high-resolution data from digg.com onto bipartite network of users and their comments onto posted stories, we identify user communities centered around certain popular posts and determine emotional contents of the related comments by the emotion-classifier developed for this type of texts. Applied over different time periods, this framework reveals strong correlations between the excess of negative emotions and the evolution of communities. We observe avalanches of emotional comments exhibiting significant self-organized critical behavior and temporal correlations. To explore robustness of these critical states, we design a network automaton model on realistic network connections and several control parameters, which can be inferred from the dataset. Dissemination of emotions by a small fraction of very active users appears to critically tune the collective states.

58 citations


Journal ArticleDOI
TL;DR: In this article, a graphical representation of human emotion extracted from text sentences is presented, which is based on data mining statistic of large cyberspace databases, and the Poisson distribution is used to transfer database extracted lexical and language parameters into coherent intensities of valence and arousal.
Abstract: This paper presents a novel concept: a graphical representation of human emotion extracted from text sentences. The major contributions of this paper are the following. First, we present a pipeline that extracts, processes, and renders emotion of 3D virtual human (VH). The extraction of emotion is based on data mining statistic of large cyberspace databases. Second, we propose methods to optimize this computational pipeline so that real-time virtual reality rendering can be achieved on common PCs. Third, we use the Poisson distribution to transfer database extracted lexical and language parameters into coherent intensities of valence and arousal—parameters of Russell’s circumplex model of emotion. The last contribution is a practical color interpretation of emotion that influences the emotional aspect of rendered VHs. To test our method’s efficiency, computational statistics related to classical or untypical cases of emotion are provided. In order to evaluate our approach, we applied our method to diverse areas such as cyberspace forums, comics, and theater dialogs.

38 citations


Journal ArticleDOI
TL;DR: The algorithm is tested in a variety of testbeds in both recall and precision oriented settings and its performance is found to be better or at least equal to previous state-of-the-art approaches, overall constituting a very effective and robust solution.

12 citations


Posted Content
24 Nov 2010
TL;DR: In this article, the authors present an empirical study of user activity in online BBC discussion forums, measured by the number of posts written by individual debaters and the average sentiment of these posts.
Abstract: We present an empirical study of user activity in online BBC discussion forums, measured by the number of posts written by individual debaters and the average sentiment of these posts. Nearly 2.5 million posts from over 18 thousand users were investigated. Scale free distributions were observed for activity in individual discussion threads as well as for overall activity. The number of unique users in a thread normalized by the thread length decays with thread length, suggesting that thread life is sustained by mutual discussions rather than by independent comments. Automatic sentiment analysis shows that most posts contain negative emotions and the most active users in individual threads express predominantly negative sentiments. It follows that the average emotion of longer threads is more negative and that threads can be sustained by negative comments. An agent based computer simulation model has been used to reproduce several essential characteristics of the analyzed system. The model stresses the role of discussions between users, especially emotionally laden quarrels between supporters of opposite opinions, and represents many observed statistics of the forum.

11 citations