scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Our Twitter Profiles, Our Selves: Predicting Personality with Twitter

TL;DR: It is argued that being able to predict user personality goes well beyond the initial goal of informing the design of new personalized applications as it, for example, expands current studies on privacy in social media.
Abstract: Psychological personality has been shown to affect a variety of aspects: preferences for interaction styles in the digital world and for music genres, for example Consequently, the design of personalized user interfaces and music recommender systems might benefit from understanding the relationship between personality and use of social media Since there has not been a study between personality and use of Twitter at large, we set out to analyze the relationship between personality and different types of Twitter users, including popular users and influentials For 335 users, we gather personality data, analyze it, and find that both popular users and influentials are extroverts and emotionally stable (low in the trait of Neuroticism) Interestingly, we also find that popular users are `imaginative' (high in Openness), while influentials tend to be `organized' (high in Conscientiousness) We then show a way of accurately predicting a user's personality simply based on three counts publicly available on profiles: following, followers, and listed counts Knowing these three quantities about an active user, one can predict the user's five personality traits with a root-mean-squared error below 088 on a $[1,5]$ scale Based on these promising results, we argue that being able to predict user personality goes well beyond our initial goal of informing the design of new personalized applications as it, for example, expands current studies on privacy in social media
Citations
More filters
Journal ArticleDOI
TL;DR: It is shown that easily accessible digital records of behavior, Facebook Likes, can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender.
Abstract: We show that easily accessible digital records of behavior, Facebook Likes, can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. The analysis presented is based on a dataset of over 58,000 volunteers who provided their Facebook Likes, detailed demographic profiles, and the results of several psychometric tests. The proposed model uses dimensionality reduction for preprocessing the Likes data, which are then entered into logistic/linear regression to predict individual psychodemographic profiles from Likes. The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases. For the personality trait “Openness,” prediction accuracy is close to the test–retest accuracy of a standard personality test. We give examples of associations between attributes and Likes and discuss implications for online personalization and privacy.

2,232 citations

Journal ArticleDOI
TL;DR: This article deconstructs the ideological grounds of datafication, a ideology rooted in problematic ontological and epistemological claims that shows characteristics of a widespread secular belief in the context of a larger social media logic.
Abstract: Metadata and data have become a regular currency for citizens to pay for their communication services and security—a trade-off that has nestled into the comfort zone of most people. This article deconstructs the ideological grounds of datafication. Datafication is rooted in problematic ontological and epistemological claims. As part of a larger social media logic, it shows characteristics of a widespread secular belief. Dataism, as this conviction is called, is so successful because masses of people — naively or unwittingly — trust their personal information to corporate platforms. The notion of trust becomes more problematic because people’s faith is extended to other public institutions (e.g. academic research and law enforcement) that handle their (meta)data. The interlocking of government, business, and academia in the adaptation of this ideology makes us want to look more critically at the entire ecosystem of connective media.

1,076 citations


Cites background from "Our Twitter Profiles, Our Selves: P..."

  • ...For instance, Quercia et al. (2011) analyzed the relationships between personality and different types of twitterers, finding that 2Just a few remarks about Twitter’s alleged representativeness and inherent biases....

    [...]

Journal ArticleDOI
TL;DR: It is concluded that previous work may have overestimated the degree of ideological segregation in social-media usage and liberals were more likely than conservatives to engage in cross-ideological dissemination.
Abstract: We estimated ideological preferences of 3.8 million Twitter users and, using a data set of nearly 150 million tweets concerning 12 political and nonpolitical issues, explored whether online communication resembles an “echo chamber” (as a result of selective exposure and ideological segregation) or a “national conversation.” We observed that information was exchanged primarily among individuals with similar ideological preferences in the case of political issues (e.g., 2012 presidential election, 2013 government shutdown) but not many other current events (e.g., 2013 Boston Marathon bombing, 2014 Super Bowl). Discussion of the Newtown shootings in 2012 reflected a dynamic process, beginning as a national conversation before transforming into a polarized exchange. With respect to both political and nonpolitical issues, liberals were more likely than conservatives to engage in cross-ideological dissemination; this is an important asymmetry with respect to the structure of communication that is consistent with psychological theory and research bearing on ideological differences in epistemic, existential, and relational motivation. Overall, we conclude that previous work may have overestimated the degree of ideological segregation in social-media usage.

940 citations

Journal ArticleDOI
TL;DR: It is shown that computers’ judgments of people’s personalities based on their digital footprints are more accurate and valid than judgments made by their close others or acquaintances, and that computer personality judgments have higher external validity when predicting life outcomes such as substance use, political attitudes, and physical health.
Abstract: Judging others’ personalities is an essential skill in successful social living, as personality is a key driver behind people’s interactions, behaviors, and emotions. Although accurate personality judgments stem from social-cognitive skills, developments in machine learning show that computer models can also make valid judgments. This study compares the accuracy of human and computer-based personality judgments, using a sample of 86,220 volunteers who completed a 100-item personality questionnaire. We show that (i) computer predictions based on a generic digital footprint (Facebook Likes) are more accurate (r = 0.56) than those made by the participants’ Facebook friends using a personality questionnaire (r = 0.49); (ii) computer models show higher interjudge agreement; and (iii) computer personality judgments have higher external validity when predicting life outcomes such as substance use, political attitudes, and physical health; for some outcomes, they even outperform the self-rated personality scores. Computers outpacing humans in personality judgment presents significant opportunities and challenges in the areas of psychological assessment, marketing, and privacy.

740 citations

Journal ArticleDOI
TL;DR: A survey of technologies capable of dealing with human personality, and a conceptual model underlying the three main problems addressed in the literature, namely Automatic Personality Recognition, Automatic Personality Perception and Automatic Personality Synthesis.
Abstract: Personality is a psychological construct aimed at explaining the wide variety of human behaviors in terms of a few, stable and measurable individual characteristics. In this respect, any technology involving understanding, prediction and synthesis of human behavior is likely to benefit from Personality Computing approaches, i.e. from technologies capable of dealing with human personality. This paper is a survey of such technologies and it aims at providing not only a solid knowledge base about the state-of-the-art, but also a conceptual model underlying the three main problems addressed in the literature, namely Automatic Personality Recognition (inference of the true personality of an individual from behavioral evidence), Automatic Personality Perception (inference of personality others attribute to an individual based on her observable behavior) and Automatic Personality Synthesis (generation of artificial personalities via embodied agents). Furthermore, the article highlights the issues still open in the field and identifies potential application areas.

450 citations


Cites background or methods from "Our Twitter Profiles, Our Selves: P..."

  • ...Social media are one of the main channels through which people interact with others, an ideal means for self-disclosure and, therefore, an excellent ground for research on personality computing [51], [63], [64], [65], [66], [67], [68], [69] (see Table 4 for a synopsis of data, approaches and results)....

    [...]

  • ...MAE MAE MAE MAE MAE [64] 335 335 Twitter number of followers/ R 0:88 0:79 0:76 0:85 0:69 profiles followings, listed counts RMSE RMSE RMSE RMSE RMSE [65] 209 209 RenRen Profile info....

    [...]

  • ...Other [51] 167 167 Facebook profile info., egocentric R 0:12 0:10 0:10 0:11 0:10 Profiles networks, LIWC MAE MAE MAE MAE MAE [63] 279 2000 tweets LIWC, MRC, R 0:16 0:13 0:14 0:18 0:12 per subject profile info....

    [...]

  • ...MAE MAE MAE MAE MAE [64] 335 335 Twitter number of followers/ R 0:88 0:79 0:76 0:85 0:69 profiles followings, listed counts RMSE RMSE RMSE RMSE RMSE [65] 209 209 RenRen Profile info., usage C(2) 83:8 69:7 82:4 74:9 81:1 profiles statistics, emotional states C(3) 71:7 72:3 70:1 71:0 69:5 F F F F F [66] 156 473 posts on Some LIWC categories U average FriendFeed accuracy 63:1 [67] 10000 10;000 blog LIWC C(2) 80:0 posts ACC [68] 300 60;000 favorite visual patterns, R 0:19 0:17 0:22 0:12 0:17 pictures aesthetic preferences r r r r r The table reports, from left to right, the number of subjects involved in the experiments, number and type of behavioral samples, main cues, type of task and performance over different traits....

    [...]

  • ...The performance for the classification tasks is reported in terms of Mean Absolute Error (MAE), Root Mean Square Error (RMSE), F-Measure (F) and accuracy (ACC)....

    [...]

References
More filters
Book
01 Jan 2008
TL;DR: In this paper, generalized estimating equations (GEE) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC are discussed.
Abstract: tic regression, and it concerns studying the effect of covariates on the risk of disease. The chapter includes generalized estimating equations (GEE’s) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC. As a prelude to the following chapter on repeated-measures data, Chapter 5 presents time series analysis. The material on repeated-measures analysis uses linear additive models with GEE’s and PROC MIXED in SAS for linear mixed-effects models. Chapter 7 is about survival data analysis. All computing throughout the book is done using SAS procedures.

9,995 citations

01 Jan 1999
TL;DR: The Big Five taxonomy as discussed by the authors is a taxonomy of personality dimensions derived from analyses of the natural language terms people use to describe themselves 3 and others, and it has been used for personality assessment.
Abstract: 2 Taxonomy is always a contentious issue because the world does not come to us in neat little packages (S. Personality has been conceptualized from a variety of theoretical perspectives, and at various levels of Each of these levels has made unique contributions to our understanding of individual differences in behavior and experience. However, the number of personality traits, and scales designed to measure them, escalated without an end in sight (Goldberg, 1971). Researchers, as well as practitioners in the field of personality assessment, were faced with a bewildering array of personality scales from which to choose, with little guidance and no overall rationale at hand. What made matters worse was that scales with the same name often measure concepts that are not the same, and scales with different names often measure concepts that are quite similar. Although diversity and scientific pluralism are useful, the systematic accumulation of findings and the communication among researchers became difficult amidst the Babel of concepts and scales. Many personality researchers had hoped that they might devise the structure that would transform the Babel into a community speaking a common language. However, such an integration was not to be achieved by any one researcher or by any one theoretical perspective. As Allport once put it, " each assessor has his own pet units and uses a pet battery of diagnostic devices " (1958, p. 258). What personality psychology needed was a descriptive model, or taxonomy, of its subject matter. One of the central goals of scientific taxonomies is the definition of overarching domains within which large numbers of specific instances can be understood in a simplified way. Thus, in personality psychology, a taxonomy would permit researchers to study specified domains of personality characteristics, rather than examining separately the thousands of particular attributes that make human beings individual and unique. Moreover, a generally accepted taxonomy would greatly facilitate the accumulation and communication of empirical findings by offering a standard vocabulary, or nomenclature. After decades of research, the field is approaching consensus on a general taxonomy of personality traits, the " Big Five " personality dimensions. These dimensions do not represent a particular theoretical perspective but were derived from analyses of the natural-language terms people use to describe themselves 3 and others. Rather than replacing all previous systems, the Big Five taxonomy serves an integrative function because it can represent the various and diverse systems of personality …

7,787 citations


"Our Twitter Profiles, Our Selves: P..." refers background in this paper

  • ...As 1http://www.mypersonality.org/wiki/ we shall see in the next section, age is an important factor as it affects a user’s activity....

    [...]

Journal ArticleDOI
01 Mar 2002
TL;DR: This presentation discusses the design and implementation of machine learning algorithms in Java, as well as some of the techniques used to develop and implement these algorithms.
Abstract: 1. What's It All About? 2. Input: Concepts, Instances, Attributes 3. Output: Knowledge Representation 4. Algorithms: The Basic Methods 5. Credibility: Evaluating What's Been Learned 6. Implementations: Real Machine Learning Schemes 7. Moving On: Engineering The Input And Output 8. Nuts And Bolts: Machine Learning Algorithms In Java 9. Looking Forward

5,936 citations


"Our Twitter Profiles, Our Selves: P..." refers methods in this paper

  • ...To this end, for each personality trait, we perform a regression analysis with a 10fold cross-validation with 10 iterations using M5′ Rules [42]....

    [...]

  • ...To this end, for each personality trait, we perform a regression analysis with a 10- fold cross-validation with 10 iterations using M5′ Rules [42]....

    [...]

Journal ArticleDOI
TL;DR: The International Personality Item Pool (IPIP) as mentioned in this paper has been used as a prototype for public-domain personality measures, focusing on the International personality item pool, which has been widely used for personality measurement.

2,822 citations


"Our Twitter Profiles, Our Selves: P..." refers background in this paper

  • ...The five-factor model of personality, or the big five, is the most comprehensive, reliable and useful set of personality concepts [7], [11]....

    [...]

Journal ArticleDOI
TL;DR: A model is outlined that integrates the strengths of previous theories of marriage, accounts for established findings, and indicates new directions for research on how marriages change.
Abstract: Although much has been learned from cross-sectional research on marriage, an understanding of how marriages develop, succeed, and fail is best achieved with longitudinal data. In view of growing interest in longitudinal research on marriage, the authors reviewed and evaluated the literature on how the quality and stability of marriages change over time. First, prevailing theoretical perspectives are examined for their ability to explain change in marital quality and stability. Second, the methods and findings of 115 longitudinal studies--representing over 45,000 marriages--are summarized and evaluated, yielding specific suggestions for improving this research, Finally, a model is outlined that integrates the strengths of previous theories of marriage, accounts for established findings, and indicates new directions for research on how marriages change.

2,459 citations


"Our Twitter Profiles, Our Selves: P..." refers background in this paper

  • ...Also, individuals high in Extraversion tend to maintain persistent communication with their friends [2], [5], [13], [34], while those high in Neuroticism withdraw from other during times of stress [22], [23] and generally report less satisfaction with the support received by their social networks [18], [19]....

    [...]

  • ...Finally, emotionally liable and impulsive individuals are high in Neuroticism [18], [19]....

    [...]