The development and psychometric properties of LIWC2007

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Book•

Sentiment Analysis and Opinion Mining

[...]

Bing Liu¹•Institutions (1)

University of Illinois at Chicago¹

01 May 2012

TL;DR: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language as discussed by the authors and is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining.

...read moreread less

Abstract: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining. In fact, this research has spread outside of computer science to the management sciences and social sciences due to its importance to business and society as a whole. The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, blogs, micro-blogs, Twitter, and social networks. For the first time in human history, we now have a huge volume of opinionated data recorded in digital form for analysis. Sentiment analysis systems are being applied in almost every business and social domain because opinions are central to almost all human activities and are key influencers of our behaviors. Our beliefs and perceptions of reality, and the choices we make, are largely conditioned on how others see and evaluate the world. For this reason, when we need to make a decision we often seek out the opinions of others. This is true not only for individuals but also for organizations. This book is a comprehensive introductory and survey text. It covers all important topics and the latest developments in the field with over 400 references. It is suitable for students, researchers and practitioners who are interested in social media analysis in general and sentiment analysis in particular. Lecturers can readily use it in class for courses on natural language processing, social media analysis, text mining, and data mining. Lecture slides are also available online.

...read moreread less

4,515 citations

Journal Article•DOI•

The psychological meaning of words: LIWC and computerized text analysis methods

[...]

Yla R. Tausczik¹, James W. Pennebaker•Institutions (1)

University of Texas at Austin¹

01 Mar 2010-Journal of Language and Social Psychology

TL;DR: The Linguistic Inquiry and Word Count (LIWC) system as discussed by the authors is a text analysis system that counts words in psychologically meaningful categories to detect meaning in a wide variety of experimental settings, including to show attentional focus, emotionality, social relationships, thinking styles and individual differences.

...read moreread less

Abstract: We are in the midst of a technological revolution whereby, for the first time, researchers can link daily word use to a broad array of real-world behaviors. This article reviews several computerized text analysis methods and describes how Linguistic Inquiry and Word Count (LIWC) was created and validated. LIWC is a transparent text analysis program that counts words in psychologically meaningful categories. Empirical results using LIWC demonstrate its ability to detect meaning in a wide variety of experimental settings, including to show attentional focus, emotionality, social relationships, thinking styles, and individual differences.

...read moreread less

4,356 citations

Additional excerpts

...The intercorrelation among these words is low but highly significant (“a” with “an” = .13, “a” with “the” = .09, “an” with “the” = .09), resulting in Cronbach’s a of .14 (for a summary of all reliability statistics, see Pennebaker et al., 2007)....
[...]

Proceedings Article•

VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text

[...]

Clayton J. Hutto¹, Eric Gilbert¹•Institutions (1)

Georgia Institute of Technology¹

16 May 2014

TL;DR: Interestingly, using the authors' parsimonious rule-based model to assess the sentiment of tweets, it is found that VADER outperforms individual human raters, and generalizes more favorably across contexts than any of their benchmarks.

...read moreread less

Abstract: The inherent nature of social media content poses serious challenges to practical applications of sentiment analysis. We present VADER, a simple rule-based model for general sentiment analysis, and compare its effectiveness to eleven typical state-of-practice benchmarks including LIWC, ANEW, the General Inquirer, SentiWordNet, and machine learning oriented techniques relying on Naive Bayes, Maximum Entropy, and Support Vector Machine (SVM) algorithms. Using a combination of qualitative and quantitative methods, we first construct and empirically validate a gold-standard list of lexical features (along with their associated sentiment intensity measures) which are specifically attuned to sentiment in microblog-like contexts. We then combine these lexical features with consideration for five general rules that embody grammatical and syntactical conventions for expressing and emphasizing sentiment intensity. Interestingly, using our parsimonious rule-based model to assess the sentiment of tweets, we find that VADER outperforms individual human raters (F1 Classification Accuracy = 0.96 and 0.84, respectively), and generalizes more favorably across contexts than any of our benchmarks.

...read moreread less

3,299 citations

Cites background from "The development and psychometric pr..."

...decade of work by psychologists, sociologists, and linguists (Pennebaker et al., 2001; Pennebaker et al., 2007)....
[...]
...5 http://sentiwordnet.isti.cnr.it/ 6 http://sentic.net/ LIWC is well-established and has been both internally and externally validated in a process spanning more than a decade of work by psychologists, sociologists, and linguists (Pennebaker et al., 2001; Pennebaker et al., 2007)....
[...]

Proceedings Article•

Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment

[...]

Andranik Tumasjan¹, Timm O. Sprenger¹, Philipp Sandner¹, Isabell M. Welpe¹•Institutions (1)

Technische Universität München¹

16 May 2010

TL;DR: It is found that the mere number of messages mentioning a party reflects the election result, and joint mentions of two parties are in line with real world political ties and coalitions.

...read moreread less

Abstract: Twitter is a microblogging website where users read and write millions of short messages on a variety of topics every day This study uses the context of the German federal election to investigate whether Twitter is used as a forum for political deliberation and whether online messages on Twitter validly mirror offline political sentiment Using LIWC text analysis software, we conducted a content-analysis of over 100,000 messages containing a reference to either a political party or a politician Our results show that Twitter is indeed used extensively for political deliberation We find that the mere number of messages mentioning a party reflects the election result Moreover, joint mentions of two parties are in line with real world political ties and coalitions An analysis of the tweets’ political sentiment demonstrates close correspondence to the parties' and politicians’ political positions indicating that the content of Twitter messages plausibly reflects the offline political landscape We discuss the use of microblogging message content as a valid indicator of political sentiment and derive suggestions for further research

...read moreread less

2,718 citations

Cites result from "The development and psychometric pr..."

...Leaving the rules for choosing a time frame unspecified would be no problem if counting party mentions in the Twittersphere led to identical results, irrespective of which period is chosen....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Development and validation of brief measures of positive and negative affect: The PANAS scales.

[...]

David Watson¹, Lee Anna Clark, Auke Tellegen•Institutions (1)

Southern Methodist University¹

30 May 1988-Journal of Personality and Social Psychology

TL;DR: Two 10-item mood scales that comprise the Positive and Negative Affect Schedule (PANAS) are developed and are shown to be highly internally consistent, largely uncorrelated, and stable at appropriate levels over a 2-month time period.

...read moreread less

Abstract: In recent studies of the structure of affect, positive and negative affect have consistently emerged as two dominant and relatively independent dimensions. A number of mood scales have been created to measure these factors; however, many existing measures are inadequate, showing low reliability or poor convergent or discriminant validity. To fill the need for reliable and valid Positive Affect and Negative Affect scales that are also brief and easy to administer, we developed two 10-item mood scales that comprise the Positive and Negative Affect Schedule (PANAS). The scales are shown to be highly internally consistent, largely uncorrelated, and stable at appropriate levels over a 2-month time period. Normative data and factorial and external evidence of convergent and discriminant validity for the scales are also presented.

...read moreread less

34,482 citations

Journal Article•DOI•

Culture and the self: Implications for cognition, emotion, and motivation.

[...]

Hazel Rose Markus¹, Shinobu Kitayama²•Institutions (2)

University of Michigan¹, University of Oregon²

01 Jan 1991-Psychological Review

TL;DR: Theories of the self from both psychology and anthropology are integrated to define in detail the difference between a construal of self as independent and a construpal of the Self as interdependent as discussed by the authors, and these divergent construals should have specific consequences for cognition, emotion, and motivation.

...read moreread less

Abstract: People in different cultures have strikingly different construals of the self, of others, and of the interdependence of the 2. These construals can influence, and in many cases determine, the very nature of individual experience, including cognition, emotion, and motivation. Many Asian cultures have distinct conceptions of individuality that insist on the fundamental relatedness of individuals to each other. The emphasis is on attending to others, fitting in, and harmonious interdependence with them. American culture neither assumes nor values such an overt connectedness among individuals. In contrast, individuals seek to maintain their independence from others by attending to the self and by discovering and expressing their unique inner attributes. As proposed herein, these construals are even more powerful than previously imagined. Theories of the self from both psychology and anthropology are integrated to define in detail the difference between a construal of the self as independent and a construal of the self as interdependent. Each of these divergent construals should have a set of specific consequences for cognition, emotion, and motivation; these consequences are proposed and relevant empirical literature is reviewed. Focusing on differences in self-construals enables apparently inconsistent empirical findings to be reconciled, and raises questions about what have been thought to be culture-free aspects of cognition, emotion, and motivation.

...read moreread less

18,178 citations

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

13,246 citations

Book•

An Introduction to Functional Grammar

[...]

Michael Halliday

01 Jan 1985

TL;DR: Part 1 The clause: constituency towards a functional grammar clause as message clause as exchange clause as representation and above, below and beyond the clause: below the clause - groups and phrases above the clauses - the clause complex additional.

...read moreread less

Abstract: This third edition of An Introduction to Functional Grammar has been extensively revised. While retaining the organization and coverage of the earlier editions, it incorporates a considerable amount of new material. This includes strengthening the grammar through the use of data from a large-scale corpus, upgrading the description throughout, and giving greater emphasis to the systemic perspective, in which grammaticalization is understood in the context of an overall model of language.The approach taken in the book overcomes the distinction between theoretical and applied linguistics. The description of grammar is grounded in a comprehensive theory, but it is a theory which evolves in the process of being applied.

...read moreread less

12,963 citations

Book•

Descartes' Error: Emotion, Reason, and the Human Brain

[...]

Antonio R. Damasio

01 Jan 1994

TL;DR: The authors argued that rational decisions are not the product of logic alone - they require the support of emotion and feeling, drawing on his experience with neurological patients affected with brain damage, Dr Damasio showed how absence of emotions and feelings can break down rationality.

...read moreread less

Abstract: Descartes' Error offers the scientific basis for ending the division between mind and body. Antonio Damasio contends that rational decisions are not the product of logic alone - they require the support of emotion and feeling. Drawing on his experience with neurological patients affected with brain damage, Dr Damasio shows how absence of emotions and feelings can break down rationality. He also offers a new perspective on what emotions and feelings actually are: a direct view of our own body states; a link between the body and its survival-oriented regulation on the one hand, and consciousness on the other. Written as a conversation between the author and an imaginary listener, Descartes' Error leads us to conclude that human organisms are endowed from their very beginning with a spirited passion for making choices, which the social mind can then use to build rational behaviour.

...read moreread less

9,648 citations