scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter

07 Dec 2011-PLOS ONE (Public Library of Science)-Vol. 6, Iss: 12
TL;DR: Examination of expressions made on the online, global microblog and social networking service Twitter is examined, uncovering and explaining temporal variations in happiness and information levels over timescales ranging from hours to years.
Abstract: Individual happiness is a fundamental societal metric. Normally measured through self-report, happiness has often been indirectly characterized and overshadowed by more readily quantifiable economic indicators such as gross domestic product. Here, we examine expressions made on the online, global microblog and social networking service Twitter, uncovering and explaining temporal variations in happiness and information levels over timescales ranging from hours to years. Our data set comprises over 46 billion words contained in nearly 4.6 billion expressions posted over a 33 month span by over 63 million unique users. In measuring happiness, we construct a tunable, real-time, remote-sensing, and non-invasive, text-based hedonometer. In building our metric, made available with this paper, we conducted a survey to obtain happiness evaluations of over 10,000 individual words, representing a tenfold size improvement over similar existing word sets. Rather than being ad hoc, our word list is chosen solely by frequency of usage, and we show how a highly robust and tunable metric can be constructed and defended.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
01 Jun 1959

3,442 citations

Journal ArticleDOI
25 Sep 2013-PLOS ONE
TL;DR: This represents the largest study, by an order of magnitude, of language and personality, and found striking variations in language with personality, gender, and age.
Abstract: We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in language with personality, gender, and age. In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase ‘sick of’ and the word ‘depressed’), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive ‘my’ when mentioning their ‘wife’ or ‘girlfriend’ more often than females use ‘my’ with ‘husband’ or 'boyfriend’). To date, this represents the largest study, by an order of magnitude, of language and personality.

1,435 citations

Proceedings ArticleDOI
01 Aug 2017
TL;DR: Crowdourcing on Amazon Mechanical Turk was used to label a large Twitter training dataset along with additional test sets of Twitter and SMS messages for both subtasks, which included two subtasks: A, an expression-level subtask, and B, a message level subtask.
Abstract: This paper describes the fifth year of the Sentiment Analysis in Twitter task. SemEval-2017 Task 4 continues with a rerun of the subtasks of SemEval-2016 Task 4, which include identifying the overall sentiment of the tweet, sentiment towards a topic with classification on a two-point and on a five-point ordinal scale, and quantification of the distribution of sentiment towards a topic across a number of tweets: again on a two-point and on a five-point ordinal scale. Compared to 2016, we made two changes: (i) we introduced a new language, Arabic, for all subtasks, and (ii) we made available information from the profiles of the Twitter users who posted the target tweets. The task continues to be very popular, with a total of 48 teams participating this year.

1,107 citations


Cites background from "Temporal Patterns of Happiness and ..."

  • ..., 2010), social science (Dodds et al., 2011), and market research (Burton and Soboleva, 2011; Qureshi et al....

    [...]

17 Dec 2010
TL;DR: The authors survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000, using a corpus of digitized texts containing about 4% of all books ever printed.
Abstract: L'article, publie dans Science, sur une des premieres utilisations analytiques de Google Books, fondee sur les n-grammes (Google Ngrams) We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can ...

735 citations

Proceedings ArticleDOI
01 Jun 2016
TL;DR: The SemEval-2016 Task 4 comprises five subtasks, three of which represent a significant departure from previous editions. as mentioned in this paper discusses the fourth year of the Sentiment Analysis in Twitter Task and discusses the three new subtasks focus on two variants of the basic sentiment classification in Twitter task.
Abstract: This paper discusses the fourth year of the ”Sentiment Analysis in Twitter Task”. SemEval-2016 Task 4 comprises five subtasks, three of which represent a significant departure from previous editions. The first two subtasks are reruns from prior years and ask to predict the overall sentiment, and the sentiment towards a topic in a tweet. The three new subtasks focus on two variants of the basic “sentiment classification in Twitter” task. The first variant adopts a five-point scale, which confers an ordinal character to the classification task. The second variant focuses on the correct estimation of the prevalence of each class of interest, a task which has been called quantification in the supervised learning literature. The task continues to be very popular, attracting a total of 43 teams.

702 citations

References
More filters
Journal ArticleDOI
TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.
Abstract: In this final installment of the paper we consider the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now. To a considerable extent the continuous case can be obtained through a limiting process from the discrete case by dividing the continuum of messages and signals into a large but finite number of small regions and calculating the various parameters involved on a discrete basis. As the size of the regions is decreased these parameters in general approach as limits the proper values for the continuous case. There are, however, a few new effects that appear and also a general change of emphasis in the direction of specialization of the general results to particular cases.

65,425 citations

Journal ArticleDOI
01 Jan 1949-Nature
TL;DR: In this article, the authors define and examine a measure of concentration in terms of population constants, and examine the relationship between the characteristic and the index of diversity when both are applied to a logarithmic distribution.
Abstract: THE 'characteristic' defined by Yule1 and the 'index of diversity' defined by Fisher2 are two measures of the degree of concentration or diversity achieved when the individuals of a population are classified into groups. Both are defined as statistics to be calculated from sample data and not in terms of population constants. The index of diversity has so far been used chiefly with the logarithmic distribution. It cannot be used everywhere, as it does not always give values which are independent of sample size ; it cannot do so, for example, when applied to an infinite population of individuals classified into a finite number of groups. Williams3 has pointed out a relationship between the characteristic and the index of diversity when both are applied to a logarithmic distribution. The present purpose is to define and examine a measure of concentration in terms of population constants.

10,077 citations

Book
01 Jan 1957
TL;DR: In this article, the authors deal with the nature and theory of meaning and present a new, objective method for its measurement which they call the semantic differential, which can be adapted to a wide variety of problems in such areas as clinical psychology, social psychology, linguistics, mass communications, esthetics, and political science.
Abstract: In this pioneering study, the authors deal with the nature and theory of meaning and present a new, objective method for its measurement which they call the semantic differential. This instrument is not a specific test, but rather a general technique of measurement that can be adapted to a wide variety of problems in such areas as clinical psychology, social psychology, linguistics, mass communications, esthetics, and political science. The core of the book is the authors' description, application, and evaluation of this important tool and its far-reaching implications for empirical research.

9,476 citations

Proceedings ArticleDOI
26 Apr 2010
TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.
Abstract: Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing.We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks [28]. In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one's tweets. We have analyzed the tweets of top trending topics and reported on their temporal behavior and user participation. We have classified the trending topics based on the active period and the tweets and show that the majority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.To the best of our knowledge this work is the first quantitative study on the entire Twittersphere and information diffusion on it.

6,108 citations

Book
01 Jan 1949

5,898 citations