scispace - formally typeset
Search or ask a question
Book ChapterDOI

Identification of User Patterns in Social Networks by Data Mining Techniques: Facebook Case

22 Sep 2010-pp 145-153
TL;DR: The factors affecting “Facebook usage time” and ”Facebook access frequency” are revealed via various predictive data mining techniques, based on a questionnaire applied on 570 Facebook users, and the associations of the students’ opinions on the contribution of Facebook in an educational aspect are investigated by employing the association rules method.
Abstract: Currently, social networks such as Facebook or Twitter are getting more and more popular due to the opportunities they offer. As of November 2009, Facebook was the most popular and well known social network throughout the world with over 316 million users. Among the countries, Turkey is in third place in terms of Facebook users and half of them are younger than 25 years old (students). Turkey has 14 million Facebook members. The success of Facebook and the rich opportunities offered by social media sites lead to the creation of new web based applications for social networks and open up new frontiers. Thus, discovering the usage patterns of social media sites might be useful in taking decisions about the design and implementation of those applications as well as educational tools. Therefore, in this study, the factors affecting “Facebook usage time” and ”Facebook access frequency” are revealed via various predictive data mining techniques, based on a questionnaire applied on 570 Facebook users. At the same time, the associations of the students’ opinions on the contribution of Facebook in an educational aspect are investigated by employing the association rules method.

Summary (2 min read)

1 Introduction

  • In recent years, a rapid increase in numbers of social networks along with numbers of people using these networks has been observed.
  • The majority of these users have integrated such sites into their daily lifes.
  • There have been various studies about social networks in the educational context including using social networks as a tool or utilizing them as an environment for courses [6], [7], the utility of social networks in the teaching and learning process [8], their value for communication and collaboration [9], educational usage themes of social networks (e.g. [10], [11]).
  • As one the most popular social networks, Facebook is considered in the present study.
  • Data mining is a process that uses a variety of data analysis tools to discover patterns and relations in data that may be used for prediction purposes.

2 Data Mining

  • In other words, data mining is the complete process of revealing useful patterns and relationships in data by using techniques like artificial intelligence, machine learning and statistics via advanced data analysis tools.
  • Data mining methods are classified into two categories as predictive and descriptive.
  • The goal of descriptive methods is discovering deep relationships, correlations and descriptive properties of data.
  • Both of these method groups are employed by using SPSS Clementine 12.
  • Furthermore, the variable importance feature of SPSS Clementine is used in discovering the factors affecting “Facebook usage” and “Facebook access frequency”.

2.1 Methodology

  • As stated previously, various data mining techniques are employed during the analyses and except one (association rules mining discovery), their prediction performances are compared.
  • The main idea of a decision tree is to split the data recursively into subsets so that each subset covers more or fewer homogeneous states of the dependent variable.
  • On the other hand, in the pattern recognition literature, SVM (Support Vector Machine) is a state-of-the-art method with its powerful discriminative features in linear and non-linear classifications.
  • The weights in the network are determined in a training phase of the network using training data.
  • Agrawal, Imelinski and Swami stated a new approach to mining association rules in 1993 and designed a new algorithm, namely Apriori, via two phases seek mechanism on itemsets and by looking their association frequencies (Romero & Ventura, 2007).

3 Data

  • Data was collected from 570 active Turkish Facebook users with an online poll.
  • Thus members’ views of Facebook in relation to its educational usage were sought.
  • The variable names of the first part and available answers are given in Table 1.
  • Therefore, the final dataset comprised 570 people.
  • In the dataset, male and female participants are almost equal and more than 400 applicants are in the 18-25 age range.

4 Application of Data Mining

  • To discover important factors that affect Facebook usage time and access frequency to Facebook, CART, CHAID, C5, artificial neural network and SVM algorithms, which are built in to SPSS Clementine 12, were employed on the dataset at hand (see Fig. 1).
  • The overall data is partitioned as 80% training and 20% testing, respectively.
  • Therefore, it is considered that the variable importance results of SVM are the most accurate predictions.
  • Again, it can be clearly seen that age, membership in student groups and usage time variables are the most important factors affecting access frequency to Facebook.
  • Therefore, the rules which have lift values higher than 1 should be considered carefully for educational purposes.

5 Discussion and Conclusion

  • This study tried to discover the factors affecting access frequency and usage time of Facebook by various decision tree algorithms, ANN and state-of-the-art algorithm SVM.
  • According to the results, SVM exhibits the most accurate results due to the nature of the dataset at hand.
  • On the other hand, the associations of the student ideas were explored by employing the Apriori algorithm and, as can be seen from the results obtained, the contribution of Facebook to communication between classmates is more than to communication between students and teachers.
  • If the increasing trend in social network sites usage is considered, the importance of applications and approaches related to social networks can be easily understood.
  • Targeting specific ages or sex may strategically affect the success of developed applications.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

S. Kurbanoğlu et al. (Eds.): IMCW 2010, CCIS 96, pp. 145–153, 2010.
© Springer-Verlag Berlin Heidelberg 2010
Identification of User Patterns in Social Networks by
Data Mining Techniques: Facebook Case
A. Selman Bozkır
1
, S. Güzin Mazman
2
, and Ebru Akçapınar Sezer
1
1
Hacettepe University, Department of Computer Engineering, Ankara, Turkey
selman@cs.hacettepe.edu.tr, ebru@hacettepe.edu.tr
2
Hacettepe University, Department of Computer Education and Instructional Technologies,
Ankara, Turkey
s.guzin@gmail.com
Abstract. Currently, social networks such as Facebook or Twitter are getting
more and more popular due to the opportunities they offer. As of November
2009, Facebook was the most popular and well known social network through-
out the world with over 316 million users. Among the countries, Turkey is in
third place in terms of Facebook users and half of them are younger than 25
years old (students). Turkey has 14 million Facebook members. The success of
Facebook and the rich opportunities offered by social media sites lead to the
creation of new web based applications for social networks and open up new
frontiers. Thus, discovering the usage patterns of social media sites might be
useful in taking decisions about the design and implementation of those applica-
tions as well as educational tools. Therefore, in this study, the factors affecting
“Facebook usage time” and ”Facebook access frequency” are revealed via vari-
ous predictive data mining techniques, based on a questionnaire applied on 570
Facebook users. At the same time, the associations of the students’ opinions on
the contribution of Facebook in an educational aspect are investigated by em-
ploying the association rules method.
Keywords: Social networks, decision trees, Facebook, association rules.
1 Introduction
In recent years, a rapid increase in numbers of social networks along with numbers of
people using these networks has been observed. Social networks, also called social
software or collaborative software, are a range of applications that augment group
interactions and shared spaces for collaboration, social connections, and aggregate
information exchanges in a web-based environment [1]. Similarly, [2] defined social
networks as web-based services allowing individuals to 1) construct a public or semi-
public profile within a bounded system, 2) articulate a list of other users with whom
they share a connection, and (3) view and traverse their list of connections and those
made by others within the system.
Millions of users have been interested in them since the introduction of social net-
work sites (SNSs) such as MySpace, Facebook, Cyworld, Bebo, Twitter, etc. The
majority of these users have integrated such sites into their daily lifes. Because most

146 A.S. Bozkır, S.G. Mazman, and E.A. Sezer
of the social network users are young individuals, many of them are university stu-
dents. Therefore, these sites are considered to play an active role in the younger gen-
eration’s daily life [3], [4]. On the other hand, it has been stated that social networks
have a prominent educational context, and this prominence has prompted a growing
number of educators to consider them to be important sites for student learning al-
though these are not intended primarily as educational applications. Besides, it has
been suggested that these social networks help users re-situate learning in an open-
ended social context by providing opportunities for moving beyond the mere access to
content (learning about) to the social application of knowledge in a constant process
of re-orientation (learning as becoming) [5].
There have been various studies about social networks in the educational context
including using social networks as a tool or utilizing them as an environment for
courses [6], [7], the utility of social networks in the teaching and learning process [8],
their value for communication and collaboration [9], educational usage themes of
social networks (e.g. [10], [11]). However, a study in the literature about data mining
analysis of social network usage has not been encountered.
As one the most popular social networks, Facebook is considered in the present
study. Facebook is defined as “a social utility that helps people share information and
communicate more efficiently with their friends, family and co-workers” (face-
book.com). As of November 2009, with 316 million users, Facebook is the most
popular and well known social network throughout the world. Moreover, Turkey,
with 14 million members, is the third country in terms of number of Facebook users
and half of these members are younger than 25 years old [12].
Data mining is a process that uses a variety of data analysis tools to discover pat-
terns and relations in data that may be used for prediction purposes. Supervised data
mining techniques are used to model an output variable based on one or more input
variables and these models can be used to predict or forecast future cases [13].
The purpose of the present study is to discover some usage patterns (i.e. usage time
and access frequency) of Facebook users by data mining techniques. Additionally, an
attempt is made to reveal the educational associations of the users. It is believed that
social network based application development and educational programs can be en-
hanced by the findings of this study.
2 Data Mining
Data mining is the process of exploration and analysis, by automatic or semi-
automatic means, of large quantities of data in order to discover useful patterns [13].
In other words, data mining is the complete process of revealing useful patterns and
relationships in data by using techniques like artificial intelligence, machine learning
and statistics via advanced data analysis tools. Oracle BI, SPSS Clementine, SAS
Enterprise Miner and Microsoft Analysis Services are well known data mining tools
in the marketplace [14].
Data mining methods are classified into two categories as predictive and descrip-
tive. The aim of predictive methods is to make predictions on unseen cases by using

Identification of User Patterns in Social Networks by Data Mining Techniques 147
seen cases via a trained model. However, the goal of descriptive methods is discover-
ing deep relationships, correlations and descriptive properties of data.
In this study, both of these method groups are employed by using SPSS
Clementine 12. Additionally, various decision trees algorithms such as CART,
CHAID and C5; artificial neural networks (ANN) and SVM (Support Vector
Machine) classifiers in prediction of target variables are used. Furthermore, the
variable importance feature of SPSS Clementine is used in discovering the factors
affecting “Facebook usage” and “Facebook access frequency”. Likewise, the Apriori
algorithm is employed in discovering frequent opinions of students on the educational
benefits of Facebook usage.
2.1 Methodology
As stated previously, various data mining techniques are employed during the analy-
ses and except one (association rules mining discovery), their prediction performances
are compared. Thus, in this section, a brief information is presented about the meth-
odologies followed.
The decision tree method is probably the most popular classification method
among the data mining techniques due to the ease of use and visual interpretation
capabilities. Typically, a data mining task for a decision tree is classification; for
example, to identify the credit risk for each customer [15]. The main idea of a deci-
sion tree is to split the data recursively into subsets so that each subset covers more or
fewer homogeneous states of the dependent variable. At each split in the tree, all
independent variables are recalculated for their impact on the dependent variable.
When this recursive process is stopped and the tree is in a stable state, the required
decision tree is formed [15]. At this stage, new cases can be classified via the deci-
sion tree. This stage is called tree deduction. C5, Quest, CHAID [16] and CART [17]
are well-known decision tree algorithms. Nevertheless, SPSS Clementine serves
whole algorithms in its package. In essence, differentiations among these algorithms
are mainly caused by technical capabilities and employing different splitting ap-
proaches and their functions. For instance, C5 and CHAID algorithms are designed to
classify only discrete valued variables by using “gain ratio” and “gini value” splitting
approaches, respectively. However, CART algorithms are designed for both classifi-
cation and regression purposes.
On the other hand, in the pattern recognition literature, SVM (Support Vector Ma-
chine) is a state-of-the-art method with its powerful discriminative features in linear
and non-linear classifications. Generally, SVM is designed to enlarge the boundary of
any two classes in pattern space by searching for an optimal hyper plane that has
maximum distance to the closest points between two classes which are termed support
vectors [18]. However, SVM has support for multiclass predictions via different de-
veloped kernel functions. By the help of these kernel functions, solving the problems
in upper dimensional spaces becomes possible.
ANN are systems which contain intelligence nodes arranged in layers. In essence,
an ANN has an input layer, a hidden layer, and an output layer. The nodes in the hid-
den layer collect the inputs from the input layer into a single output value which is

148 A.S. Bozkır, S.G. Mazman, and E.A. Sezer
passed on to the output layer. Associated with each node in the network is a weight.
The weights in the network are determined in a training phase of the network using
training data. The network performance is then tested on the remaining data, or hold-
out sample [19].
Association rule mining is again one of the best studied descriptive mining meth-
ods since the first design and creation. Agrawal, Imelinski and Swami stated a new
approach to mining association rules in 1993 and designed a new algorithm, namely
Apriori, via two phases seek mechanism on itemsets and by looking their association
frequencies (Romero & Ventura, 2007). In the second stage of this study, the analyses
are performed by using the algorithm Apriori. In association rules, mining analyzing,
support, rule support, confidence and lift values are the important parameters in the
usefulness evaluation of rules. In this study, lift and support values are considered.
Table 1. Variable names and available answers in the first part of the poll
Variable name Type Available answers and related distributions
Sex Discrete Male (50%) / Female (50%)
Age Discrete 18-25 (74.1%) / 26-35 (20.53%) / 36-40
(3.86%) / 41 and above (1.4%)
Frequency of access to
Facebook
Discrete Once a year (0.18%) / Once a month (2.98%) /
Several times a week (25.26%) / Once a day
(22.81%) / Several times a day (48.77%)
Facebook usage time Discrete Less than 15 mins. (32.28%) / Half an hour
(39.82%) / 1 hour (14.39%) / 1-3 hours (8.6%)
/ More than 3 hours (4.74%)
Education level
Membership in any group
Membership in student
groups
Membership in common
interest groups
Membership in internet &
tech groups
Membership in
organizations
Discrete
Discrete
Discrete
Discrete
Discrete
Discrete
High School (5.96%) / Bachelor (70.35%) /
Master (23.16%)
Yes (99.82%) / No (0.18%)
Yes (86.49%) / No (13.51%)
Yes (77.54.5) / No (22.46%)
Yes (27.02%) / No (72.98%)
Yes (61.93%) / No (38.07%)
3 Data
Data was collected from 570 active Turkish Facebook users (students) with an online
poll. This online poll consisted of two sections. In the first section, demographic
characteristics of Facebook users and their frequency of Facebook usage, length of
time spent on Facebook, and memberships in Facebook groups were collected. In the
second section, a 10-point Likert scale with 11 opinions were asked, the answers
ranging from 1 (strongly disagree) to 10 (strongly agree), like “Facebook contributes
to communication between classmates”, “It’s useful for assigning tasks in classes and

Identification of User Patterns in Social Networks by Data Mining Techniques 149
homework assignments”. Thus members’ views of Facebook in relation to its educa-
tional usage were sought.
The variable names of the first part and available answers are given in Table 1.
Although the initial dataset size was larger than 570 people, during the data cleaning
and transforming steps, 13 people were removed due to the absence of sufficient in-
formation. Therefore, the final dataset comprised 570 people. In the dataset, male and
female participants are almost equal and more than 400 applicants are in the 18-25
age range. Furthermore, almost all students are at either undergraduate or graduate
level.
4 Application of Data Mining
To discover important factors that affect Facebook usage time and access frequency
to Facebook, CART, CHAID, C5, artificial neural network and SVM algorithms,
which are built in to SPSS Clementine 12, were employed on the dataset at hand (see
Fig. 1). The overall data is partitioned as 80% training and 20% testing, respectively.
Training and test datasets are selected randomly. As the dataset consists of discrete
valued variables, the true and false prediction rates are listed.
According to the results (see Table 2), SVM achieves the most accurate predictions
for two target variables. Therefore, it is considered that the variable importance re-
sults of SVM are the most accurate predictions. As can be seen in Fig. 2, sex, educa-
tion level, membership in a group and membership in any common interest groups are
the most important factors affecting Facebook usage time. Sex plays a crucial role in
Facebook usage time with 68%. Again, it can be clearly seen that age, membership in
student groups and usage time variables are the most important factors affecting
access frequency to Facebook. The effect of age is more than 80% in access
frequency.
Table 2. Applied algorithms and prediction results
Target variable - Applied algorithm True classification False classification
Facebook usage – SVM 62.63 % 37.37 %
Facebook usage – ANN
Facebook usage – C5
47.72 %
47.54 %
52.28 %
52.46 %
Facebook usage – CART 43.68 % 56.32 %
Facebook usage – CHAID
Access frequency to Facebook – SVM
41.40 %
69.65 %
58.60 %
30.35 %
Access frequency to Facebook – C5
Access frequency to Facebook – CART
Access frequency to Facebook – CHAID
Access frequency to Facebook – ANN
55.79 %
52.81 %
50.35 %
48.77 %
44.21 %
47.19 %
49.65 %
51.23 %

Citations
More filters
Journal ArticleDOI
TL;DR: It is found that the access to social capital on Facebook is primarily based on a reasonable amount of active communication, and which kinds of posts are most advantageous as well as questions of homophily based on social capital.

108 citations

Journal ArticleDOI
TL;DR: The process of capturing data from social media over the years along with the similarity detection based on similar choices of the users in social networks are addressed.
Abstract: In the current era of automation, machines are constantly being channelized to provide accurate interpretations of what people express on social media. The human race nowadays is submerged in the idea of what and how people think and the decisions taken thereafter are mostly based on the drift of the masses on social platforms. This article provides a multifaceted insight into the evolution of sentiment analysis into the limelight through the sudden explosion of plethora of data on the internet. This article also addresses the process of capturing data from social media over the years along with the similarity detection based on similar choices of the users in social networks. The techniques of communalizing user data have also been surveyed in this article. Data, in its different forms, have also been analyzed and presented as a part of survey in this article. Other than this, the methods of evaluating sentiments have been studied, categorized, and compared, and the limitations exposed in the hope that this shall provide scope for better research in the future.

82 citations


Cites methods from "Identification of User Patterns in ..."

  • ...Contrary to this, another work [84] employs a priori algorithm along with association rules to understand the involvement of Facebook in connecting students with each other versus students with teachers....

    [...]

Journal ArticleDOI
TL;DR: Results show that quality of life, peer influence, & structure time significantly predicts use of both one- to-many communication features and one-to-one communication features (such as private messaging and chat).

72 citations

Journal ArticleDOI
TL;DR: This pilot study provided insight in the usability of the individual sociability scores for future smartphone application to provide longitudinal objective measures of normal and atypical human social behavioral profiles in their natural environment.

40 citations

Journal ArticleDOI
TL;DR: This study explores Big Data practices at Facebook through an investigation of the role of commensuration or ‘the transformation of different qualities into a common metric’ in the structuration of analysis and interaction with a major online social media platform.
Abstract: This study explores Big Data practices at Facebook through an investigation of the role of commensuration or ‘the transformation of different qualities into a common metric’ in the structuration of analysis and interaction with a major online social media platform. It proposes a conceptual framework and demonstrates the empirical potential of a pragmatic approach based on reading published materials and available documentation. Facebook’s Data Warehousing and Analytics Infrastructure serves as an illustrative example to begin tracing out and describe data assemblages in more detail. In being attentive to the motivations, drivers and challenges engineers face when dealing with Big Data, it is argued that their solutions can enable and support but also constrain specific analytical and transactional capabilities or data flows between various devices and actors. The analysis thus moves beyond methodological critiques of the utility of Big Data that lack empirical support and specificity. It is further argued that analytics not just describe but also actively participate in the enactment of social worlds, thereby opening possibilities for new markets or market segments to arise. Online sociality accounts for a model of the social that makes it visible and measurable qua markets inviting data recontextualisation and the creation of value along multiple axes. Contra Facebook’s claim to make the web more ‘social’, an investigation of commensuration brings to the fore the question how the social is accounted for in the first place.

25 citations


Cites background from "Identification of User Patterns in ..."

  • ...identifying user behaviour patterns (Bozkır et al., 2010), or indeed for any other two-group classification problem....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This article gives an introduction to the subject of classification and regression trees by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.
Abstract: Classification and regression trees are machine-learning methods for constructing prediction models from data. The models are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. As a result, the partitioning can be represented graphically as a decision tree. Classification trees are designed for dependent variables that take a finite number of unordered values, with prediction error measured in terms of misclassification cost. Regression trees are for dependent variables that take continuous or ordered discrete values, with prediction error typically measured by the squared difference between the observed and predicted values. This article gives an introduction to the subject by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 14-23 DOI: 10.1002/widm.8 This article is categorized under: Technologies > Classification Technologies > Machine Learning Technologies > Prediction Technologies > Statistical Fundamentals

16,974 citations

Journal ArticleDOI
TL;DR: This publication contains reprint articles for which IEEE does not hold copyright and which are likely to be copyrighted.
Abstract: Social network sites SNSs are increasingly attracting the attention of academic and industry researchers intrigued by their affordances and reach This special theme section of the Journal of Computer-Mediated Communication brings together scholarship on these emergent phenomena In this introductory article, we describe features of SNSs and propose a comprehensive definition We then present one perspective on the history of such sites, discussing key changes and developments After briefly summarizing existing scholarship concerning SNSs, we discuss the articles in this special section and conclude with considerations for future research

14,912 citations

Book
01 Jan 1983
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Abstract: The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

14,825 citations

Journal ArticleDOI
TL;DR: The technique set out in the paper, CHAID, is an offshoot of AID (Automatic Interaction Detection) designed for a categorized dependent variable with built-in significance testing, multi-way splits, and a new type of predictor which is especially useful in handling missing information.
Abstract: SUMMARY The technique set out in the paper, CHAID, is an offshoot of AID (Automatic Interaction Detection) designed for a categorized dependent variable. Some important modifications which are relevant to standard AID include: built-in significance testing with the consequence of using the most significant predictor (rather than the most explanatory), multi-way splits (in contrast to binary) and a new type of predictor which is especially useful in handling missing information.

2,744 citations


Additional excerpts

  • ...For instance, C5 and CHAID algorithms are designed to classify only discrete valued variables by using “gain ratio” and “gini value” splitting approaches, respectively....

    [...]

  • ...To discover important factors that affect Facebook usage time and access frequency to Facebook, CART, CHAID, C5, artificial neural network and SVM algorithms, which are built in to SPSS Clementine 12, were employed on the dataset at hand (see Figure 1)....

    [...]

  • ...C5, Quest, CHAID (Kass, 1980) and CART (Breiman, Friedman, Olshen, & Stone, 1984) are well-known decision tree algorithms....

    [...]

  • ...Additionally, various decision trees algorithms such as CART, CHAID and C5; artificial neural networks (ANN) and SVM (Support Vector Machine) classifiers in prediction of target variables are used....

    [...]

Frequently Asked Questions (1)
Q1. What have the authors contributed in "Identification of user patterns in social networks by data mining techniques: facebook case" ?

Therefore, in this study, the factors affecting “ Facebook usage time ” and ” Facebook access frequency ” are revealed via various predictive data mining techniques, based on a questionnaire applied on 570 Facebook users.