Modeling Social Annotation: a Bayesian Approach

In this article, a probabilistic model that takes interests of individual annotators into account in order to find hidden topics of annotated resources is proposed, which can be used to infer categorical knowledge, classify documents or recommend new relevant information.

Abstract:

Collaborative tagging systems, such as Delicious, CiteULike, and others, allow users to annotate resources, e.g., Web pages or scientific papers, with descriptive labels called tags. The social annotations contributed by thousands of users, can potentially be used to infer categorical knowledge, classify documents or recommend new relevant information. Traditional text inference methods do not make best use of social annotation, since they do not take into account variations in individual users' perspectives and vocabulary. In a previous work, we introduced a simple probabilistic model that takes interests of individual annotators into account in order to find hidden topics of annotated resources. Unfortunately, that approach had one major shortcoming: the number of topics and interests must be specified a priori. To address this drawback, we extend the model to a fully Bayesian framework, which offers a way to automatically estimate these numbers. In particular, the model allows the number of interests and topics to change as suggested by the structure of the data. We evaluate the proposed model in detail on the synthetic and real-world data by comparing its performance to Latent Dirichlet Allocation on the topic extraction task. For the latter evaluation, we apply the model to infer topics of Web resources from social annotations obtained from Delicious in order to discover new resources similar to a specified one. Our empirical results demonstrate that the proposed model is a promising method for exploiting social knowledge contained in user-generated annotations.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Clustering Blog Posts Using Tags and Relations in the Blogosphere

Yin Zhang,Kening Gao,Bin Zhang,Jinhua Guo,Feihang Gao,Pengwei Guo +5 moreNortheastern University (China)

Show Less

TL;DR: This paper proposes a folksonomy extending framework using relations in the Blogosphere and demonstrates how the framework could be used to help clustering blog posts and evaluates the framework by comparing it with content-based folksonomy extend approach.

...read moreread less

Community-oriented models and applications for the social web

James Caverlee,Said Kashoob +1 moreTexas A&M University

Show Less

TL;DR: This dissertation provides a foundation for the study of how social knowledge networks are self-organized, a deeper understanding and appreciation of the factors impacting collective intelligence, and the creation of new information access algorithms for leveraging these communities.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 moreUniversity of California, Berkeley,Stanford University

- 01 Mar 2003 -

Journal of Machine Learning Research

Show Less

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Journal ArticleDOI

The anatomy of a large-scale hypertextual Web search engine

Sergey Brin,Lawrence Page +1 moreStanford University

Show Less

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

...read moreread less

Journal ArticleDOI

Authoritative sources in a hyperlinked environment

Jon KleinbergCornell University

- 01 Sep 1999 -

Journal of the ACM

Show Less

TL;DR: This work proposes and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure, and has connections to the eigenvectors of certain matrices associated with the link graph.

...read moreread less

Journal ArticleDOI

Divergence measures based on the Shannon entropy

J. LinBrandeis University

- 01 Jan 1991 -

IEEE Transactions on Information Theory

Show Less

TL;DR: A novel class of information-theoretic divergence measures based on the Shannon entropy is introduced, which do not require the condition of absolute continuity to be satisfied by the probability distributions involved and are established in terms of bounds.

...read moreread less

Journal ArticleDOI

Hierarchical Dirichlet Processes

Yee Whye Teh,Michael I. Jordan,Matthew J. Beal,David M. Blei +3 more

- 01 Dec 2006 -

Journal of the American Statistical Asso...

Show Less

TL;DR: This work considers problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups, and considers a hierarchical model, specifically one in which the base measure for the childDirichlet processes is itself distributed according to a Dirichlet process.

...read moreread less