scispace - formally typeset
Search or ask a question
Book ChapterDOI

Efficient User Profiling in Twitter Social Network Using Traditional Classifiers

TL;DR: An efficient supervised machine learning approach which categorizes Twitter users based on three important features into six interest categories, and proposes a design for a real-time system for Twitter user profiling along with a prototype implementation.
Abstract: Any discussion in social media can be fruitful if the people involved in the discussion are related to a field. In a similar way to advertise an event, it is useful to find users who are interested in the content of the event. In social networks like Twitter, which contain a large number of users, the categorization of users based on their interests will help this cause. This paper presents an efficient supervised machine learning approach which categorizes Twitter users based on three important features(Tweet-based, User-based and Time-series based) into six interest categories - Politics, Entertainment, Entrepreneurship, Journalism, Science & Technology and Healthcare. We compare the proposed feature set with different traditional classifiers like Support Vector Machines, Naive-Bayes, k-Nearest Neighbours, Decision Tree and Logistic Regression, and obtain upto 89.82% accuracy in classification. We also propose a design for a real-time system for Twitter user profiling along with a prototype implementation.
Citations
More filters
Proceedings ArticleDOI
29 Apr 2016
TL;DR: This document is about the accuracy analysis of two of the most prominent classifiers present in today's academic arena, Ridge Classifier and Linear Discriminant Analysis, and compares their effectiveness at mapping a set of data scraped in real-time from Twitter to its corresponding generalised hashtag.
Abstract: This document is about the accuracy analysis of two of the most prominent classifiers present in today's academic arena. Classifiers are being used extensively in machine learning applications today and need to present a high rate of success to be considered useful. Tikhonov regularization incorporated within the Ridge Classifier is the basis for its classification. It utilises the LevenbergMarquardt algorithm for non-linear least-squares problems to classify objects. Linear Discriminant Analysis, on the other hand, utilises aspects of ANOVA[2,3] and regression analysis. LDA works by getting explicit information from the user. It needs the definition of the variables — both dependent and independent. It doesn't use any implicit assumptions in its modelling. There is no interconnection between the two variables initially. Using these two classifiers we compare their effectiveness at mapping a set of data scraped in real-time from Twitter to its corresponding generalised hashtag, and suggest why the differences, if any, arise.

14 citations


Cites background from "Efficient User Profiling in Twitter..."

  • ...The solution may be an improvised one requiring manual intervention every time to a fully automated solution which can scour an entire website and collect appropriate information....

    [...]

Journal ArticleDOI
TL;DR: From this study, researchers or research organizations may have a better idea on who their audiences are, and hence more effective strategies can be taken by researchers or organizations to reach a wider audience and enhance their influence.
Abstract: The purpose of this paper is to understand the similarities and differences between the Twitter users who tweeted on journal articles in psychology and political science disciplines.,The data were collected from Web of Science, Altmetric.com, and Twitter. A total of 91,826 tweets with 22,541 distinct Twitter user profiles for psychology discipline and 29,958 tweets with 10,478 distinct Twitter user profiles for political science discipline were used for analysis. The demographics analysis includes gender, geographic location, individual or organization user, academic or non-academic background, and psychology/political science domain knowledge background. A machine learning approach using support vector machine (SVM) was used for user classification based on the Twitter user profile information. Latent Dirichlet allocation (LDA) topic modeling was used to discover the topics that the users discussed from the tweets.,Results showed that the demographics of Twitter users who tweeted on psychology and political science are significantly different. Tweets on journal articles in psychology reflected more the impact of scientific research finding on the general public and attracted more attention from the general public than the ones in political science. Disciplinary difference in term of user demographics exists, and thus it is important to take the discipline into consideration for future altmetrics studies.,From this study, researchers or research organizations may have a better idea on who their audiences are, and hence more effective strategies can be taken by researchers or organizations to reach a wider audience and enhance their influence.

12 citations

Book ChapterDOI
05 Nov 2017
TL;DR: Several studies have shown that the users of Twitter reveal their interests (i.e., what they like) while they share their opinions, preferences and personal stories.
Abstract: Several studies have shown that the users of Twitter reveal their interests (i.e., what they like) while they share their opinions, preferences and personal stories.

7 citations

Journal ArticleDOI
01 Apr 2019
TL;DR: It is shown that Frisk is capable of inferring the interests in a multilingual context with good accuracy and that the psychological dimensions used by Ascertain are also good predictors of a user’s interests.
Abstract: Although social media platforms serve diverse purposes, from social and professional networking to photo sharing and blogging, people frequently use them to share the thoughts and opinions and most importantly, their interests (e.g., politics, economy, sports). Understanding the interests of social media users is key to many applications that need to characterize them to recommend some services and find other individuals with similar interests. In this paper, we propose two approaches to the automatic determination of the interests of social media users. The first, that we named Frisk, is an unsupervised multilingual approach that determines the interests of a user from the explicit meaning of the words that occur in the user’s posts. The second, that we termed Ascertain, is a supervised approach that resorts to the hidden dimensions of the words that several studies indicated to be capable of revealing some of the psychological processes and personality traits of a person. In our evaluation, that we performed on two datasets obtained from Twitter, we show that Frisk is capable of inferring the interests in a multilingual context with good accuracy and that the psychological dimensions used by Ascertain are also good predictors of a user’s interests.

7 citations

Proceedings ArticleDOI
01 Oct 2017
TL;DR: Experiments performed using real-world SNA data reveal that LPF can indicate the occupation to a certain extent and are more efficient than word count features and TF-IDF features, and it is revealed that good occupation profiling accuracy can be achieved by integrating LPF and LDA topic features inferred from text content.
Abstract: The flourishing of social networking applications (SNA) in recent years has revolutionized the way we live. On the other hand, SNA are also used for malicious activities. Identifying the physical person behind a SNA account is a challenge task in investigations of SNA-involved criminal cases. As one way to address this challenge, automatic user attribute inference, which is also referred to as user profiling, has become an increasingly attractive research topic. However, existing methods may suffer from problems such as low quality content and complex social networks. The increasing amounts of user-generated geolocation data on SNA give rise to new opportunities for user profiling. In this paper, we study the problem of user profiling with user-generated geolocation data. Specifically, we take occupation profiling as an entry point. We design a supervised occupation classification framework and propose two categories of geolocation-based features, i.e., mobility pattern features (MPF) and location preference features (LPF). Experiments performed using real-world SNA data reveal that LPF can indicate the occupation to a certain extent and are more efficient than word count features and TF-IDF features. Moreover, it is also revealed that good occupation profiling accuracy can be achieved by integrating LPF and LDA topic features inferred from text content.

3 citations


Cites background from "Efficient User Profiling in Twitter..."

  • ...The goal of user profiling is inferring the missing personal attributes of a user, such as gender [1], [2], age [3], geolocation [4], [5], occupation [6], [7], interests [8]–[11], and personality [12] with observed data generated by the user and others....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This work investigates whether measurements of collective mood states derived from large-scale Twitter feeds are correlated to the value of the Dow Jones Industrial Average (DJIA) over time and indicates that the accuracy of DJIA predictions can be significantly improved by the inclusion of specific public mood dimensions but not others.

4,453 citations


"Efficient User Profiling in Twitter..." refers background in this paper

  • ...Also, there are various studies on sentiment analysis and tweet analysis [8, 9] in Twitter social network alone....

    [...]

Proceedings Article
05 Jul 2011
TL;DR: This paper evaluates the usefulness of existing lexical resources as well as features that capture information about the informal and creative language used in microblogging, and uses existing hashtags in the Twitter data for building training data.
Abstract: In this paper, we investigate the utility of linguistic features for detecting the sentiment of Twitter messages. We evaluate the usefulness of existing lexical resources as well as features that capture information about the informal and creative language used in microblogging. We take a supervied approach to the problem, but leverage existing hashtags in the Twitter data for building training data.

1,261 citations

Book ChapterDOI
06 Oct 2010
TL;DR: To deal with streaming unbalanced classes, a sliding window Kappa statistic is proposed for evaluation in time-changing data streams, and a study on Twitter data is performed using learning algorithms for data streams.
Abstract: Micro-blogs are a challenging new source of information for data mining techniques. Twitter is a micro-blogging service built to discover what is happening at any moment in time, anywhere in the world. Twitter messages are short, and generated constantly, and well suited for knowledge discovery using data stream mining. We briefly discuss the challenges that Twitter data streams pose, focusing on classification problems, and then consider these streams for opinion mining and sentiment analysis. To deal with streaming unbalanced classes, we propose a sliding window Kappa statistic for evaluation in time-changing data streams. Using this statistic we perform a study on Twitter data using learning algorithms for data streams.

612 citations


"Efficient User Profiling in Twitter..." refers background in this paper

  • ...Also, there are various studies on sentiment analysis and tweet analysis [8, 9] in Twitter social network alone....

    [...]

Proceedings ArticleDOI
26 Oct 2010
TL;DR: A "topic profile" is developed, which characterizes users' topics of interest, by discerning which categories appear frequently and cover the entities in the Tweets, and it is demonstrated that even in this early work, the main topics ofinterest for the users are successfully discovered.
Abstract: Twitter, a micro-blogging service, provides users with a framework for writing brief, often-noisy postings about their lives. These posts are called "Tweets." In this paper we present early results on discovering Twitter users' topics of interest by examining the entities they mention in their Tweets. Our approach leverages a knowledge base to disambiguate and categorize the entities in the Tweets. We then develop a "topic profile," which characterizes users' topics of interest, by discerning which categories appear frequently and cover the entities. We demonstrate that even in this early work we are able to successfully discover the main topics of interest for the users in our study.

360 citations


"Efficient User Profiling in Twitter..." refers background in this paper

  • ...In this context, the problem of automatically identifying user interests [15] and...

    [...]

Proceedings Article
05 Jul 2011
TL;DR: It is argued that for some highly structured and recurring events, such as sports, it is better to use more sophisticated techniques to summarize the relevant tweets, and a solution based on learning the underlying hidden state representation of the event via Hidden Markov Models is given.
Abstract: Twitter has become exceedingly popular, with hundreds of millions of tweets being posted every day on a wide variety of topics. This has helped make real-time search applications possible with leading search engines routinely displaying relevant tweets in response to user queries. Recent research has shown that a considerable fraction of these tweets are about "events," and the detection of novel events in the tweet-stream has attracted a lot of research interest. However, very little research has focused on properly displaying this real-time information about events. For instance, the leading search engines simply display all tweets matching the queries in reverse chronological order. In this paper we argue that for some highly structured and recurring events, such as sports, it is better to use more sophisticated techniques to summarize the relevant tweets. We formalize the problem of summarizing event-tweets and give a solution based on learning the underlying hidden state representation of the event via Hidden Markov Models. In addition, through extensive experiments on real-world data we show that our model significantly outperforms some intuitive and competitive baselines.

331 citations


"Efficient User Profiling in Twitter..." refers methods in this paper

  • ...A participant-based event summarization approach was proposed in Chakrabarti and Punera [10]....

    [...]