Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter

doi:10.1371/JOURNAL.PONE.0026752

Home
/
Papers
/
Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter

Journal Article•DOI•

Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter

Peter Sheridan Dodds¹, Kameron Decker Harris¹, Isabel M. Kloumann¹, Catherine A. Bliss¹, Christopher M. Danforth¹ - Show less +1 more•Institutions (1)

University of Vermont¹

07 Dec 2011-PLOS ONE (Public Library of Science)-Vol. 6, Iss: 12

TL;DR: Examination of expressions made on the online, global microblog and social networking service Twitter is examined, uncovering and explaining temporal variations in happiness and information levels over timescales ranging from hours to years.

read less

Abstract: Individual happiness is a fundamental societal metric. Normally measured through self-report, happiness has often been indirectly characterized and overshadowed by more readily quantifiable economic indicators such as gross domestic product. Here, we examine expressions made on the online, global microblog and social networking service Twitter, uncovering and explaining temporal variations in happiness and information levels over timescales ranging from hours to years. Our data set comprises over 46 billion words contained in nearly 4.6 billion expressions posted over a 33 month span by over 63 million unique users. In measuring happiness, we construct a tunable, real-time, remote-sensing, and non-invasive, text-based hedonometer. In building our metric, made available with this paper, we conducted a survey to obtain happiness evaluations of over 10,000 individual words, representing a tenfold size improvement over similar existing word sets. Rather than being ad hoc, our word list is chosen solely by frequency of usage, and we show how a highly robust and tunable metric can be constructed and defended.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

The measurement of meaning

[...]

John M. Kittross

01 Jun 1959

3,442 citations

Journal Article•DOI•

Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach

[...]

H. Andrew Schwartz¹, Johannes C. Eichstaedt¹, Margaret L. Kern¹, Lukasz Dziurzynski¹, Stephanie M. Ramones¹, Megha Agrawal¹, Achal Shah¹, Michal Kosinski², David Stillwell², Martin E. P. Seligman¹, Lyle H. Ungar¹ - Show less +7 more•Institutions (2)

University of Pennsylvania¹, University of Cambridge²

25 Sep 2013-PLOS ONE

TL;DR: This represents the largest study, by an order of magnitude, of language and personality, and found striking variations in language with personality, gender, and age.

...read moreread less

Abstract: We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in language with personality, gender, and age. In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase ‘sick of’ and the word ‘depressed’), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive ‘my’ when mentioning their ‘wife’ or ‘girlfriend’ more often than females use ‘my’ with ‘husband’ or 'boyfriend’). To date, this represents the largest study, by an order of magnitude, of language and personality.

...read moreread less

1,435 citations

Proceedings Article•DOI•

SemEval-2017 Task 4: Sentiment Analysis in Twitter

[...]

Sara Rosenthal¹, Noura Farra¹, Preslav Nakov²•Institutions (2)

Columbia University¹, Qatar Computing Research Institute²

01 Aug 2017

TL;DR: Crowdourcing on Amazon Mechanical Turk was used to label a large Twitter training dataset along with additional test sets of Twitter and SMS messages for both subtasks, which included two subtasks: A, an expression-level subtask, and B, a message level subtask.

...read moreread less

Abstract: This paper describes the fifth year of the Sentiment Analysis in Twitter task. SemEval-2017 Task 4 continues with a rerun of the subtasks of SemEval-2016 Task 4, which include identifying the overall sentiment of the tweet, sentiment towards a topic with classification on a two-point and on a five-point ordinal scale, and quantification of the distribution of sentiment towards a topic across a number of tweets: again on a two-point and on a five-point ordinal scale. Compared to 2016, we made two changes: (i) we introduced a new language, Arabic, for all subtasks, and (ii) we made available information from the profiles of the Twitter users who posted the target tweets. The task continues to be very popular, with a total of 48 teams participating this year.

...read moreread less

1,107 citations

Cites background from "Temporal Patterns of Happiness and ..."

..., 2010), social science (Dodds et al., 2011), and market research (Burton and Soboleva, 2011; Qureshi et al....
[...]

Quantitative Analysis of Culture Using Millions of Digitized Books

[...]

Björn-Olav Dozo

17 Dec 2010

TL;DR: The authors survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000, using a corpus of digitized texts containing about 4% of all books ever printed.

...read moreread less

Abstract: L'article, publie dans Science, sur une des premieres utilisations analytiques de Google Books, fondee sur les n-grammes (Google Ngrams) We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can ...

...read moreread less

735 citations

Proceedings Article•DOI•

SemEval-2016 Task 4: Sentiment Analysis in Twitter

[...]

Preslav Nakov¹, Alan Ritter², Sara Rosenthal³, Fabrizio Sebastiani⁴, Veselin Stoyanov⁵ - Show less +1 more•Institutions (5)

Qatar Foundation¹, Ohio State University², Columbia University³, Qatar Computing Research Institute⁴, Facebook⁵

01 Jun 2016

TL;DR: The SemEval-2016 Task 4 comprises five subtasks, three of which represent a significant departure from previous editions. as mentioned in this paper discusses the fourth year of the Sentiment Analysis in Twitter Task and discusses the three new subtasks focus on two variants of the basic sentiment classification in Twitter task.

...read moreread less

Abstract: This paper discusses the fourth year of the ”Sentiment Analysis in Twitter Task”. SemEval-2016 Task 4 comprises five subtasks, three of which represent a significant departure from previous editions. The first two subtasks are reruns from prior years and ask to predict the overall sentiment, and the sentiment towards a topic in a tweet. The three new subtasks focus on two variants of the basic “sentiment classification in Twitter” task. The first variant adopts a five-point scale, which confers an ordinal character to the classification task. The second variant focuses on the correct estimation of the prevalence of each class of interest, a task which has been called quantification in the supervised learning literature. The task continues to be very popular, attracting a total of 43 teams.

...read moreread less

702 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

A mathematical theory of communication

[...]

Claude E. Shannon

01 Jul 1948-Bell System Technical Journal

TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.

...read moreread less

Abstract: In this final installment of the paper we consider the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now. To a considerable extent the continuous case can be obtained through a limiting process from the discrete case by dividing the continuum of messages and signals into a large but finite number of small regions and calculating the various parameters involved on a discrete basis. As the size of the regions is decreased these parameters in general approach as limits the proper values for the continuous case. There are, however, a few new effects that appear and also a general change of emphasis in the direction of specialization of the general results to particular cases.

...read moreread less

65,425 citations

Journal Article•DOI•

Measurement of diversity

[...]

E. H. Simpson

01 Jan 1949-Nature

TL;DR: In this article, the authors define and examine a measure of concentration in terms of population constants, and examine the relationship between the characteristic and the index of diversity when both are applied to a logarithmic distribution.

...read moreread less

Abstract: THE 'characteristic' defined by Yule1 and the 'index of diversity' defined by Fisher2 are two measures of the degree of concentration or diversity achieved when the individuals of a population are classified into groups. Both are defined as statistics to be calculated from sample data and not in terms of population constants. The index of diversity has so far been used chiefly with the logarithmic distribution. It cannot be used everywhere, as it does not always give values which are independent of sample size ; it cannot do so, for example, when applied to an infinite population of individuals classified into a finite number of groups. Williams3 has pointed out a relationship between the characteristic and the index of diversity when both are applied to a logarithmic distribution. The present purpose is to define and examine a measure of concentration in terms of population constants.

...read moreread less

10,077 citations

Book•

The Measurement of Meaning

[...]

Charles E. Osgood, George J. Suci, Percy H. Tannenbaum

01 Jan 1957

TL;DR: In this article, the authors deal with the nature and theory of meaning and present a new, objective method for its measurement which they call the semantic differential, which can be adapted to a wide variety of problems in such areas as clinical psychology, social psychology, linguistics, mass communications, esthetics, and political science.

...read moreread less

Abstract: In this pioneering study, the authors deal with the nature and theory of meaning and present a new, objective method for its measurement which they call the semantic differential. This instrument is not a specific test, but rather a general technique of measurement that can be adapted to a wide variety of problems in such areas as clinical psychology, social psychology, linguistics, mass communications, esthetics, and political science. The core of the book is the authors' description, application, and evaluation of this important tool and its far-reaching implications for empirical research.

...read moreread less

9,476 citations

Proceedings Article•DOI•

What is Twitter, a social network or a news media?

[...]

Haewoon Kwak¹, Changhyun Lee¹, Hosung Park¹, Sue Moon¹•Institutions (1)

KAIST¹

26 Apr 2010

TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.

...read moreread less

Abstract: Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing.We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks [28]. In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one's tweets. We have analyzed the tweets of top trending topics and reported on their temporal behavior and user participation. We have classified the trending topics based on the active period and the tweets and show that the majority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.To the best of our knowledge this work is the first quantitative study on the entire Twittersphere and information diffusion on it.

...read moreread less

6,108 citations

Book•

Human behavior and the principle of least effort

[...]

George Kingsley Zipf¹•Institutions (1)

Harvard University¹

01 Jan 1949

5,898 citations