scispace - formally typeset
Proceedings ArticleDOI

Predicting User-to-content Links in Flickr Groups

26 Aug 2012-pp 124-131

...read more


Citations
More filters
Journal ArticleDOI
TL;DR: This study integrates data mining with social computing to form a social network mining algorithm, which helps the individual distinguish these strong friends from a large number of friends in a specific portion of the social networks in which he or she is interested.
Abstract: Social networks are generally made of individuals who are linked by some types of interdependencies such as friendship. Most individuals in social networks have many linkages in terms of friends, connections, and/or followers. Among these linkages, some of them are stronger than others. For instance, some friends may be acquaintances of an individual, whereas others may be friends who care about him or her (e.g., who frequently post on his or her wall). In this study, we integrate data mining with social computing to form a social network mining algorithm, which helps the individual distinguish these strong friends from a large number of friends in a specific portion of the social networks in which he or she is interested. Moreover, our mining algorithm allows the individual to interactively change his or her mining parameters. Furthermore, we discuss applications of our social mining algorithm to organizational computing and e-commerce

53 citations


Cites background from "Predicting User-to-content Links in..."

  • ...…social networking websites or services—such as Facebook, Flickr, Google+, LinkedIn, Twitter, and Weibo (Chang et al. 2013; Gonzalez et al. 2013; Negi and Chaudhury 2012; Paul et al. 2012; Sumbaly, Kreps, and Shah 2013; Sun et al. 2013; van Laere, Schockaert, and Dhoedt 2013)—are in use…...

    [...]

Journal ArticleDOI
TL;DR: It is concluded that the use of the ensemble model can reduce the average correlation coefficient (as one of the evaluation criteria of the model) to 74.4 ± 16.4, which is an acceptable result.
Abstract: The nature and importance of user’s comments in various social media systems play an important role in creating or changing people's perceptions of certain topics or popularizing them. It has now an important place in various fields, including education, sales, prediction, and so on. In this paper, Facebook social network has been considered as a case study. The purpose of this study is to predict the volume of Facebook users' comments on the published content called post. Therefore, the existing problem is classified as a regression problem. In the method presented in this paper, three regression models called elastic network, M5P model, and radial basis function regression model are combined and an ensemble model is made to predict the volume of comments. In order to combine these base models, a strategy called stack generalization is used, based on which the output of the base models is provided to a linear regression model as new features. This linear regression model combines the outputs of the 3 base models and determines the final output of the system. To evaluate the performance of the proposed model, a database of the UCI dataset, which has 5 training sets and 10 test sets, has been used. Each test set in this database has 100 records. In the present study, the efficiency of the base models and the proposed ensemble model is evaluated on all these sets. Finally, it is concluded that the use of the ensemble model can reduce the average correlation coefficient (as one of the evaluation criteria of the model) to 74.4 ± 16.4, which is an acceptable result.

2 citations

Journal ArticleDOI
TL;DR: This paper demonstrates a preliminary work to exhibit the sufficiency of machine learning prescient calculations on the remarks of most well known long range informal communication site, Facebook.
Abstract: The latest decade lead to a unconstrained advancement of the importance of online networking. Due to the gigantic measures of records appearing in web organizing, there is a colossal necessity for the programmed examination of such records. Online networking customer's comments expect a basic part in building or changing the one's acknowledgments concerning some specific indicate or making it standard. This paper demonstrates a preliminary work to exhibit the sufficiency of machine learning prescient calculations on the remarks of most well known long range informal communication site, Facebook. We showed the customer remark patters, over the posts on Facebook Pages and expected that what number of remarks a post is depended upon to get in next H hrs. To automate the technique, we developed an item display containing the crawler, information processor and data disclosure module. For prediction, we used the Linear Regression model (Simple Linear model, Linear relapse model and Pace relapse model) and Non-Linear Regression model(Decision tree, MLP) on different data set varieties and evaluated them under the appraisal estimations Hits@10, AUC@10, Processing Time and Mean Absolute Error.

1 citations


Cites background from "Predicting User-to-content Links in..."

  • ...Even paper [8] has predicted the formation of user-to-content links in Flickr Groups to predict the chance that a user will comment or like an image updated by another user....

    [...]


References
More filters
Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

27,392 citations


"Predicting User-to-content Links in..." refers methods in this paper

  • ...To discover these topics in an unsupervised manner we employ a popularly used topic model the Latent Dirichlet Allocation [24]....

    [...]

Proceedings Article
03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

25,546 citations

Proceedings ArticleDOI
24 Oct 2007
TL;DR: This paper examines data gathered from four popular online social networks: Flickr, YouTube, LiveJournal, and Orkut, and reports that the indegree of user nodes tends to match the outdegree; the networks contain a densely connected core of high-degree nodes; and that this core links small groups of strongly clustered, low-degree node at the fringes of the network.
Abstract: Online social networking sites like Orkut, YouTube, and Flickr are among the most popular sites on the Internet. Users of these sites form a social network, which provides a powerful means of sharing, organizing, and finding content and contacts. The popularity of these sites provides an opportunity to study the characteristics of online social network graphs at large scale. Understanding these graphs is important, both to improve current systems and to design new applications of online social networks.This paper presents a large-scale measurement study and analysis of the structure of multiple online social networks. We examine data gathered from four popular online social networks: Flickr, YouTube, LiveJournal, and Orkut. We crawled the publicly accessible user links on each site, obtaining a large portion of each social network's graph. Our data set contains over 11.3 million users and 328 million links. We believe that this is the first study to examine multiple online social networks at scale.Our results confirm the power-law, small-world, and scale-free properties of online social networks. We observe that the indegree of user nodes tends to match the outdegree; that the networks contain a densely connected core of high-degree nodes; and that this core links small groups of strongly clustered, low-degree nodes at the fringes of the network. Finally, we discuss the implications of these structural properties for the design of social network based systems.

3,107 citations


"Predicting User-to-content Links in..." refers background in this paper

  • ...A. Data-Set We identify three Flickr Groups namely Historical Places5, Architecture of Days Gone By6 and Food around the world7....

    [...]

  • ...We do this due to the fact that a large part of content-mediated interactions and social interactions happen within Flickr Groups [1]....

    [...]

  • ...The only consideration when choosing a Flickr Group is that the group has sufficient “Activity” - this is achieved by sorting the Flickr Groups on “Activity”, an option provide by the Flickr platform....

    [...]

  • ...To the best of our knowledge we are the first to investigate the problem of predicting user-tocontent links in Flickr Groups....

    [...]

  • ...The reasons for such community structures could be varied ranging from - interest in some specific aspect of the Flickr Group’s overall topic/theme or preference for a particular brand of camera/lens or regional affinity....

    [...]

Journal ArticleDOI
TL;DR: Estimation techniques are developed for the special case of a single relation social network, with blocks specified a priori, and an extension of the model allows for tendencies toward reciprocation of ties beyond those explained by the partition.
Abstract: A stochastic model is proposed for social networks in which the actors in a network are partitioned into subgroups called blocks. The model provides a stochastic generalization of the blockmodel. Estimation techniques are developed for the special case of a single relation social network, with blocks specified a priori. An extension of the model allows for tendencies toward reciprocation of ties beyond those explained by the partition. The extended model provides a one degree-of-freedom test of the model. A numerical example from the social network literature is used to illustrate the methods.

2,233 citations


"Predicting User-to-content Links in..." refers methods in this paper

  • ...Unfortunately, the SBM suffers from the limitation that each user can belong to only one subgroup/block....

    [...]

  • ...We employ the Stochastic Block Model SBM [25] approach for identifying such subgroups or communities from the interaction network....

    [...]

Posted Content
TL;DR: This article proposed supervised latent Dirichlet allocation (sLDA), a statistical model of labeled documents, which accommodates a variety of response types and derived an approximate maximum-likelihood procedure for parameter estimation, which relies on variational methods to handle intractable posterior expectations.
Abstract: We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive an approximate maximum-likelihood procedure for parameter estimation, which relies on variational methods to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and the political tone of amendments in the U.S. Senate based on the amendment text. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression.

1,397 citations