Identification of User Patterns in Social Networks by Data Mining Techniques: Facebook Case
Summary (2 min read)
1 Introduction
- In recent years, a rapid increase in numbers of social networks along with numbers of people using these networks has been observed.
- The majority of these users have integrated such sites into their daily lifes.
- There have been various studies about social networks in the educational context including using social networks as a tool or utilizing them as an environment for courses [6], [7], the utility of social networks in the teaching and learning process [8], their value for communication and collaboration [9], educational usage themes of social networks (e.g. [10], [11]).
- As one the most popular social networks, Facebook is considered in the present study.
- Data mining is a process that uses a variety of data analysis tools to discover patterns and relations in data that may be used for prediction purposes.
2 Data Mining
- In other words, data mining is the complete process of revealing useful patterns and relationships in data by using techniques like artificial intelligence, machine learning and statistics via advanced data analysis tools.
- Data mining methods are classified into two categories as predictive and descriptive.
- The goal of descriptive methods is discovering deep relationships, correlations and descriptive properties of data.
- Both of these method groups are employed by using SPSS Clementine 12.
- Furthermore, the variable importance feature of SPSS Clementine is used in discovering the factors affecting “Facebook usage” and “Facebook access frequency”.
2.1 Methodology
- As stated previously, various data mining techniques are employed during the analyses and except one (association rules mining discovery), their prediction performances are compared.
- The main idea of a decision tree is to split the data recursively into subsets so that each subset covers more or fewer homogeneous states of the dependent variable.
- On the other hand, in the pattern recognition literature, SVM (Support Vector Machine) is a state-of-the-art method with its powerful discriminative features in linear and non-linear classifications.
- The weights in the network are determined in a training phase of the network using training data.
- Agrawal, Imelinski and Swami stated a new approach to mining association rules in 1993 and designed a new algorithm, namely Apriori, via two phases seek mechanism on itemsets and by looking their association frequencies (Romero & Ventura, 2007).
3 Data
- Data was collected from 570 active Turkish Facebook users with an online poll.
- Thus members’ views of Facebook in relation to its educational usage were sought.
- The variable names of the first part and available answers are given in Table 1.
- Therefore, the final dataset comprised 570 people.
- In the dataset, male and female participants are almost equal and more than 400 applicants are in the 18-25 age range.
4 Application of Data Mining
- To discover important factors that affect Facebook usage time and access frequency to Facebook, CART, CHAID, C5, artificial neural network and SVM algorithms, which are built in to SPSS Clementine 12, were employed on the dataset at hand (see Fig. 1).
- The overall data is partitioned as 80% training and 20% testing, respectively.
- Therefore, it is considered that the variable importance results of SVM are the most accurate predictions.
- Again, it can be clearly seen that age, membership in student groups and usage time variables are the most important factors affecting access frequency to Facebook.
- Therefore, the rules which have lift values higher than 1 should be considered carefully for educational purposes.
5 Discussion and Conclusion
- This study tried to discover the factors affecting access frequency and usage time of Facebook by various decision tree algorithms, ANN and state-of-the-art algorithm SVM.
- According to the results, SVM exhibits the most accurate results due to the nature of the dataset at hand.
- On the other hand, the associations of the student ideas were explored by employing the Apriori algorithm and, as can be seen from the results obtained, the contribution of Facebook to communication between classmates is more than to communication between students and teachers.
- If the increasing trend in social network sites usage is considered, the importance of applications and approaches related to social networks can be easily understood.
- Targeting specific ages or sex may strategically affect the success of developed applications.
Did you find this useful? Give us your feedback
Citations
108 citations
82 citations
Cites methods from "Identification of User Patterns in ..."
...Contrary to this, another work [84] employs a priori algorithm along with association rules to understand the involvement of Facebook in connecting students with each other versus students with teachers....
[...]
72 citations
40 citations
25 citations
Cites background from "Identification of User Patterns in ..."
...identifying user behaviour patterns (Bozkır et al., 2010), or indeed for any other two-group classification problem....
[...]
References
21,694 citations
16,974 citations
14,912 citations
14,825 citations
2,744 citations
Additional excerpts
...For instance, C5 and CHAID algorithms are designed to classify only discrete valued variables by using “gain ratio” and “gini value” splitting approaches, respectively....
[...]
...To discover important factors that affect Facebook usage time and access frequency to Facebook, CART, CHAID, C5, artificial neural network and SVM algorithms, which are built in to SPSS Clementine 12, were employed on the dataset at hand (see Figure 1)....
[...]
...C5, Quest, CHAID (Kass, 1980) and CART (Breiman, Friedman, Olshen, & Stone, 1984) are well-known decision tree algorithms....
[...]
...Additionally, various decision trees algorithms such as CART, CHAID and C5; artificial neural networks (ANN) and SVM (Support Vector Machine) classifiers in prediction of target variables are used....
[...]