scispace - formally typeset
Proceedings ArticleDOI

Empirical Evaluation of Profile Characteristics for Gender Classification on Twitter

Reads0
Chats0
TLDR
This work explores profile characteristics for gender classification on Twitter and provides a novel technique to reduce the number of features of text-based profile characteristics from the order of millions to a few thousands and, in some cases, to only 40 features.
Abstract
Online Social Networks (OSNs) provide reliable communication among users from different countries. The volume of texts generated by OSNs is huge and highly informative. Gender classification can serve commercial organizations for advertising, law enforcement for legal investigation, and others for social reasons. Here we explore profile characteristics for gender classification on Twitter. Unlike existing approaches to gender classification that depend heavily on posted text such as tweets, here we study the relative strengths of different characteristics extracted from Twitter profiles (e.g., first name and background color in a user's profile page). Our goal is to evaluate profile characteristics with respect to their predictive accuracy and computational complexity. In addition, we provide a novel technique to reduce the number of features of text-based profile characteristics from the order of millions to a few thousands and, in some cases, to only 40 features. We prove the validity of our approach by examining different classifiers over a large dataset of Twitter profiles.

read more

Citations
More filters
Proceedings ArticleDOI

The Social Impact of Natural Language Processing

TL;DR: A number of social implications of NLP are identified and discussed and their ethical significance, as well as ways to address them are discussed.
Proceedings ArticleDOI

Demographic Factors Improve Classification Performance

TL;DR: By including age or gender information in text-classification tasks consistently and significantly improve performance over demographic-agnostic models, which are commonly used in natural language processing tasks.
Proceedings ArticleDOI

Personality Traits on Twitter—or—How to Get 1,500 Personality Tests in a Week

TL;DR: The experiments show that social media data can provide sufficient linguistic evidence to reliably predict two of four personality dimensions, and a novel corpus of 1.2M English tweets annotated with Myers-Briggs personality type and gender is presented.
Proceedings ArticleDOI

Multitask learning for mental health conditions with limited social media data

TL;DR: The framework proposed significantly improves over all baselines and single-task models for predicting mental health conditions, with particularly significant gains for conditions with limited data, and establishes for the first time the potential of deep learning in the prediction of mental health from online user-generated text.
Dissertation

Gender, Genre, and Writing Style in Formal Written Texts

Sara Steiner
TL;DR: This paper found that only few things changed over the years, but the characteristics of male and female language remained almost the same, which is surprising if we consider the change of the women's position in the society.
References
More filters
Journal ArticleDOI

The WEKA data mining software: an update

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Journal ArticleDOI

KNIME - the Konstanz information miner: version 2.0 and beyond

TL;DR: Some of the design aspects of the underlying architecture are described, briey sketch how new nodes can be incorporated, and some of the new features of version 2.0 are highlighted.
Proceedings ArticleDOI

Classifying latent user attributes in twitter

TL;DR: A novel investigation of stacked-SVM-based classification algorithms over a rich set of original features, applied to classifying these four user attributes, as distinct from the other primarily spoken genres previously studied in the user-property classification literature.
Journal ArticleDOI

Automatically Categorizing Written Texts by Author Gender

TL;DR: It is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with approximately 80 per cent accuracy.
Proceedings Article

Discriminating Gender on Twitter

TL;DR: The construction of a large, multilingual dataset labeled with gender is described and statistical models for determining the gender of uncharacterized Twitter users are investigated, and several different classifier types are explored.