Efficient User Profiling in Twitter Social Network Using Traditional Classifiers

doi:10.1007/978-3-319-23258-4_35

Home
/
Papers
/
Efficient User Profiling in Twitter Social Network Using Traditional Classifiers

Book Chapter•DOI•

Efficient User Profiling in Twitter Social Network Using Traditional Classifiers

M. A. Raghuram¹, K. Akshay¹, K. Chandrasekaran¹•Institutions (1)

National Institute of Technology, Karnataka¹

01 Jan 2016-pp 399-411

TL;DR: An efficient supervised machine learning approach which categorizes Twitter users based on three important features into six interest categories, and proposes a design for a real-time system for Twitter user profiling along with a prototype implementation.

read less

Abstract: Any discussion in social media can be fruitful if the people involved in the discussion are related to a field. In a similar way to advertise an event, it is useful to find users who are interested in the content of the event. In social networks like Twitter, which contain a large number of users, the categorization of users based on their interests will help this cause. This paper presents an efficient supervised machine learning approach which categorizes Twitter users based on three important features(Tweet-based, User-based and Time-series based) into six interest categories - Politics, Entertainment, Entrepreneurship, Journalism, Science & Technology and Healthcare. We compare the proposed feature set with different traditional classifiers like Support Vector Machines, Naive-Bayes, k-Nearest Neighbours, Decision Tree and Logistic Regression, and obtain upto 89.82% accuracy in classification. We also propose a design for a real-time system for Twitter user profiling along with a prototype implementation.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

A comparison of linear discriminant analysis and ridge classifier on Twitter data

[...]

Anagh Singh¹, B. Shiva Prakash¹, K. Chandrasekaran¹•Institutions (1)

National Institute of Technology, Karnataka¹

29 Apr 2016

TL;DR: This document is about the accuracy analysis of two of the most prominent classifiers present in today's academic arena, Ridge Classifier and Linear Discriminant Analysis, and compares their effectiveness at mapping a set of data scraped in real-time from Twitter to its corresponding generalised hashtag.

...read moreread less

Abstract: This document is about the accuracy analysis of two of the most prominent classifiers present in today's academic arena. Classifiers are being used extensively in machine learning applications today and need to present a high rate of success to be considered useful. Tikhonov regularization incorporated within the Ridge Classifier is the basis for its classification. It utilises the LevenbergMarquardt algorithm for non-linear least-squares problems to classify objects. Linear Discriminant Analysis, on the other hand, utilises aspects of ANOVA[2,3] and regression analysis. LDA works by getting explicit information from the user. It needs the definition of the variables — both dependent and independent. It doesn't use any implicit assumptions in its modelling. There is no interconnection between the two variables initially. Using these two classifiers we compare their effectiveness at mapping a set of data scraped in real-time from Twitter to its corresponding generalised hashtag, and suggest why the differences, if any, arise.

...read moreread less

14 citations

Cites background from "Efficient User Profiling in Twitter..."

...The solution may be an improvised one requiring manual intervention every time to a fully automated solution which can scour an entire website and collect appropriate information....
[...]

Journal Article•DOI•

A comparative analysis of Twitter users who Tweeted on psychology and political science journal articles

[...]

Yanfen Zhou, Jin-Cheon Na

11 Nov 2019-Online Information Review

TL;DR: From this study, researchers or research organizations may have a better idea on who their audiences are, and hence more effective strategies can be taken by researchers or organizations to reach a wider audience and enhance their influence.

...read moreread less

Abstract: The purpose of this paper is to understand the similarities and differences between the Twitter users who tweeted on journal articles in psychology and political science disciplines.,The data were collected from Web of Science, Altmetric.com, and Twitter. A total of 91,826 tweets with 22,541 distinct Twitter user profiles for psychology discipline and 29,958 tweets with 10,478 distinct Twitter user profiles for political science discipline were used for analysis. The demographics analysis includes gender, geographic location, individual or organization user, academic or non-academic background, and psychology/political science domain knowledge background. A machine learning approach using support vector machine (SVM) was used for user classification based on the Twitter user profile information. Latent Dirichlet allocation (LDA) topic modeling was used to discover the topics that the users discussed from the tweets.,Results showed that the demographics of Twitter users who tweeted on psychology and political science are significantly different. Tweets on journal articles in psychology reflected more the impact of scientific research finding on the general public and attracted more attention from the general public than the ones in political science. Disciplinary difference in term of user demographics exists, and thus it is important to take the discipline into consideration for future altmetrics studies.,From this study, researchers or research organizations may have a better idea on who their audiences are, and hence more effective strategies can be taken by researchers or organizations to reach a wider audience and enhance their influence.

...read moreread less

12 citations

Book Chapter•DOI•

FRISK: A Multilingual Approach to Find twitteR InterestS via wiKipedia

[...]

Coriane Nana Jipmo¹, Gianluca Quercini¹, Nacéra Bennacer¹•Institutions (1)

Université Paris-Saclay¹

05 Nov 2017

TL;DR: Several studies have shown that the users of Twitter reveal their interests (i.e., what they like) while they share their opinions, preferences and personal stories.

...read moreread less

Abstract: Several studies have shown that the users of Twitter reveal their interests (i.e., what they like) while they share their opinions, preferences and personal stories.

...read moreread less

7 citations

Journal Article•DOI•

Determining the interests of social media users: two approaches

[...]

Nacéra Bennacer Seghouani¹, Coriane Nana Jipmo¹, Gianluca Quercini¹•Institutions (1)

University of Paris-Sud¹

01 Apr 2019

TL;DR: It is shown that Frisk is capable of inferring the interests in a multilingual context with good accuracy and that the psychological dimensions used by Ascertain are also good predictors of a user’s interests.

...read moreread less

Abstract: Although social media platforms serve diverse purposes, from social and professional networking to photo sharing and blogging, people frequently use them to share the thoughts and opinions and most importantly, their interests (e.g., politics, economy, sports). Understanding the interests of social media users is key to many applications that need to characterize them to recommend some services and find other individuals with similar interests. In this paper, we propose two approaches to the automatic determination of the interests of social media users. The first, that we named Frisk, is an unsupervised multilingual approach that determines the interests of a user from the explicit meaning of the words that occur in the user’s posts. The second, that we termed Ascertain, is a supervised approach that resorts to the hidden dimensions of the words that several studies indicated to be capable of revealing some of the psychological processes and personality traits of a person. In our evaluation, that we performed on two datasets obtained from Twitter, we show that Frisk is capable of inferring the interests in a multilingual context with good accuracy and that the psychological dimensions used by Ascertain are also good predictors of a user’s interests.

...read moreread less

7 citations

Proceedings Article•DOI•

Occupation profiling with user-generated geolocation data

[...]

Xiaohui Han, Lianhai Wang, Guangqi Liu, Zhao Dawei, Shujiang Xu - Show less +1 more

01 Oct 2017

TL;DR: Experiments performed using real-world SNA data reveal that LPF can indicate the occupation to a certain extent and are more efficient than word count features and TF-IDF features, and it is revealed that good occupation profiling accuracy can be achieved by integrating LPF and LDA topic features inferred from text content.

...read moreread less

Abstract: The flourishing of social networking applications (SNA) in recent years has revolutionized the way we live. On the other hand, SNA are also used for malicious activities. Identifying the physical person behind a SNA account is a challenge task in investigations of SNA-involved criminal cases. As one way to address this challenge, automatic user attribute inference, which is also referred to as user profiling, has become an increasingly attractive research topic. However, existing methods may suffer from problems such as low quality content and complex social networks. The increasing amounts of user-generated geolocation data on SNA give rise to new opportunities for user profiling. In this paper, we study the problem of user profiling with user-generated geolocation data. Specifically, we take occupation profiling as an entry point. We design a supervised occupation classification framework and propose two categories of geolocation-based features, i.e., mobility pattern features (MPF) and location preference features (LPF). Experiments performed using real-world SNA data reveal that LPF can indicate the occupation to a certain extent and are more efficient than word count features and TF-IDF features. Moreover, it is also revealed that good occupation profiling accuracy can be achieved by integrating LPF and LDA topic features inferred from text content.

...read moreread less

3 citations

Cites background from "Efficient User Profiling in Twitter..."

...The goal of user profiling is inferring the missing personal attributes of a user, such as gender [1], [2], age [3], geolocation [4], [5], occupation [6], [7], interests [8]–[11], and personality [12] with observed data generated by the user and others....
[...]

References

PDF

Open Access

More filters

Journal Article•DOI•

Twitter mood predicts the stock market.

[...]

Johan Bollen¹, Huina Mao¹, Xiao-Jun Zeng²•Institutions (2)

Indiana University¹, University of Manchester²

01 Mar 2011-Journal of Computational Science

TL;DR: This work investigates whether measurements of collective mood states derived from large-scale Twitter feeds are correlated to the value of the Dow Jones Industrial Average (DJIA) over time and indicates that the accuracy of DJIA predictions can be significantly improved by the inclusion of specific public mood dimensions but not others.

...read moreread less

4,453 citations

"Efficient User Profiling in Twitter..." refers background in this paper

...Also, there are various studies on sentiment analysis and tweet analysis [8, 9] in Twitter social network alone....
[...]

Proceedings Article•

Twitter Sentiment Analysis: The Good the Bad and the OMG!

[...]

Efthymios Kouloumpis, Theresa Wilson¹, Johanna D. Moore•Institutions (1)

Johns Hopkins University¹

05 Jul 2011

TL;DR: This paper evaluates the usefulness of existing lexical resources as well as features that capture information about the informal and creative language used in microblogging, and uses existing hashtags in the Twitter data for building training data.

...read moreread less

Abstract: In this paper, we investigate the utility of linguistic features for detecting the sentiment of Twitter messages. We evaluate the usefulness of existing lexical resources as well as features that capture information about the informal and creative language used in microblogging. We take a supervied approach to the problem, but leverage existing hashtags in the Twitter data for building training data.

...read moreread less

1,261 citations

Book Chapter•DOI•

Sentiment knowledge discovery in twitter streaming data

[...]

Albert Bifet¹, Eibe Frank¹•Institutions (1)

University of Waikato¹

06 Oct 2010

TL;DR: To deal with streaming unbalanced classes, a sliding window Kappa statistic is proposed for evaluation in time-changing data streams, and a study on Twitter data is performed using learning algorithms for data streams.

...read moreread less

Abstract: Micro-blogs are a challenging new source of information for data mining techniques. Twitter is a micro-blogging service built to discover what is happening at any moment in time, anywhere in the world. Twitter messages are short, and generated constantly, and well suited for knowledge discovery using data stream mining. We briefly discuss the challenges that Twitter data streams pose, focusing on classification problems, and then consider these streams for opinion mining and sentiment analysis. To deal with streaming unbalanced classes, we propose a sliding window Kappa statistic for evaluation in time-changing data streams. Using this statistic we perform a study on Twitter data using learning algorithms for data streams.

...read moreread less

612 citations

"Efficient User Profiling in Twitter..." refers background in this paper

...Also, there are various studies on sentiment analysis and tweet analysis [8, 9] in Twitter social network alone....
[...]

Proceedings Article•DOI•

Discovering users' topics of interest on twitter: a first look

[...]

Matthew Michelson, Sofus A. Macskassy

26 Oct 2010

TL;DR: A "topic profile" is developed, which characterizes users' topics of interest, by discerning which categories appear frequently and cover the entities in the Tweets, and it is demonstrated that even in this early work, the main topics ofinterest for the users are successfully discovered.

...read moreread less

Abstract: Twitter, a micro-blogging service, provides users with a framework for writing brief, often-noisy postings about their lives. These posts are called "Tweets." In this paper we present early results on discovering Twitter users' topics of interest by examining the entities they mention in their Tweets. Our approach leverages a knowledge base to disambiguate and categorize the entities in the Tweets. We then develop a "topic profile," which characterizes users' topics of interest, by discerning which categories appear frequently and cover the entities. We demonstrate that even in this early work we are able to successfully discover the main topics of interest for the users in our study.

...read moreread less

360 citations

"Efficient User Profiling in Twitter..." refers background in this paper

...In this context, the problem of automatically identifying user interests [15] and...
[...]

Proceedings Article•

Event Summarization Using Tweets

[...]

Deepayan Chakrabarti¹, Kunal Punera¹•Institutions (1)

Yahoo!¹

05 Jul 2011

TL;DR: It is argued that for some highly structured and recurring events, such as sports, it is better to use more sophisticated techniques to summarize the relevant tweets, and a solution based on learning the underlying hidden state representation of the event via Hidden Markov Models is given.

...read moreread less

Abstract: Twitter has become exceedingly popular, with hundreds of millions of tweets being posted every day on a wide variety of topics. This has helped make real-time search applications possible with leading search engines routinely displaying relevant tweets in response to user queries. Recent research has shown that a considerable fraction of these tweets are about "events," and the detection of novel events in the tweet-stream has attracted a lot of research interest. However, very little research has focused on properly displaying this real-time information about events. For instance, the leading search engines simply display all tweets matching the queries in reverse chronological order. In this paper we argue that for some highly structured and recurring events, such as sports, it is better to use more sophisticated techniques to summarize the relevant tweets. We formalize the problem of summarizing event-tweets and give a solution based on learning the underlying hidden state representation of the event via Hidden Markov Models. In addition, through extensive experiments on real-world data we show that our model significantly outperforms some intuitive and competitive baselines.

...read moreread less

331 citations