scispace - formally typeset
Proceedings ArticleDOI

iTopicModel: Information Network-Integrated Topic Modeling

Reads0
Chats0
TLDR
A novel topic modeling framework is proposed, which builds a unified generative topic model that is able to consider both text and structure information for documents, and a graphical model is proposed to describe the generative model.
Abstract
Document networks, i.e., networks associated with text information, are becoming increasingly popular due to the ubiquity of Web documents, blogs, and various kinds of online data. In this paper, we propose a novel topic modeling framework for document networks, which builds a unified generative topic model that is able to consider both text and structure information for documents. A graphical model is proposed to describe the generative model. On the top layer of this graphical model, we define a novel multivariate Markov Random Field for topic distribution random variables for each document, to model the dependency relationships among documents over the network structure. On the bottom layer, we follow the traditional topic model to model the generation of text for each document. A joint distribution function for both the text and structure of the documents is thus provided. A solution to estimate this topic model is given, by maximizing the log-likelihood of the joint probability. Some important practical issues in real applications are also discussed, including how to decide the topic number and how to choose a good network structure. We apply the model on two real datasets, DBLP and Cora, and the experiments show that this model is more effective in comparison with the state-of-the-art topic modeling algorithms.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

PathSim: meta path-based top-K similarity search in heterogeneous information networks

TL;DR: Under the meta path framework, a novel similarity measure called PathSim is defined that is able to find peer objects in the network (e.g., find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures.
Book ChapterDOI

Mining Text Data

TL;DR: Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining.
Book ChapterDOI

A Survey of Text Clustering Algorithms

TL;DR: This chapter will study the key challenges of the clustering problem, as it applies to the text domain, and discuss the key methods used for text clustering, and their relative advantages.
Proceedings ArticleDOI

Ranking-based clustering of heterogeneous information networks with star network schema

TL;DR: This paper studies clustering of multi-typed heterogeneous networks with a star network schema and proposes a novel algorithm, NetClus, that utilizes links across multityped objects to generate high-quality net-clusters and generates informative clusters.
Journal ArticleDOI

Mining heterogeneous information networks: a structural analysis approach

TL;DR: A set of methodologies that can effectively and efficiently mine useful knowledge from such information networks are summarized, and some promising research directions are pointed out.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

Normalized cuts and image segmentation

TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Journal ArticleDOI

Finding and evaluating community structure in networks.

TL;DR: It is demonstrated that the algorithms proposed are highly effective at discovering community structure in both computer-generated and real-world network data, and can be used to shed light on the sometimes dauntingly complex structure of networked systems.
Proceedings ArticleDOI

Normalized cuts and image segmentation

TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Related Papers (5)