Proceedings ArticleDOI
iTopicModel: Information Network-Integrated Topic Modeling
Yizhou Sun,Jiawei Han,Jing Gao,Yintao Yu +3 more
- pp 493-502
Reads0
Chats0
TLDR
A novel topic modeling framework is proposed, which builds a unified generative topic model that is able to consider both text and structure information for documents, and a graphical model is proposed to describe the generative model.Abstract:
Document networks, i.e., networks associated with text information, are becoming increasingly popular due to the ubiquity of Web documents, blogs, and various kinds of online data. In this paper, we propose a novel topic modeling framework for document networks, which builds a unified generative topic model that is able to consider both text and structure information for documents. A graphical model is proposed to describe the generative model. On the top layer of this graphical model, we define a novel multivariate Markov Random Field for topic distribution random variables for each document, to model the dependency relationships among documents over the network structure. On the bottom layer, we follow the traditional topic model to model the generation of text for each document. A joint distribution function for both the text and structure of the documents is thus provided. A solution to estimate this topic model is given, by maximizing the log-likelihood of the joint probability. Some important practical issues in real applications are also discussed, including how to decide the topic number and how to choose a good network structure. We apply the model on two real datasets, DBLP and Cora, and the experiments show that this model is more effective in comparison with the state-of-the-art topic modeling algorithms.read more
Citations
More filters
Journal ArticleDOI
PathSim: meta path-based top-K similarity search in heterogeneous information networks
TL;DR: Under the meta path framework, a novel similarity measure called PathSim is defined that is able to find peer objects in the network (e.g., find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures.
Book ChapterDOI
Mining Text Data
TL;DR: Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining.
Book ChapterDOI
A Survey of Text Clustering Algorithms
TL;DR: This chapter will study the key challenges of the clustering problem, as it applies to the text domain, and discuss the key methods used for text clustering, and their relative advantages.
Proceedings ArticleDOI
Ranking-based clustering of heterogeneous information networks with star network schema
Yizhou Sun,Yintao Yu,Jiawei Han +2 more
TL;DR: This paper studies clustering of multi-typed heterogeneous networks with a star network schema and proposes a novel algorithm, NetClus, that utilizes links across multityped objects to generate high-quality net-clusters and generates informative clusters.
Journal ArticleDOI
Mining heterogeneous information networks: a structural analysis approach
Yizhou Sun,Jiawei Han +1 more
TL;DR: A set of methodologies that can effectively and efficiently mine useful knowledge from such information networks are summarized, and some promising research directions are pointed out.
References
More filters
Journal ArticleDOI
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI
Normalized cuts and image segmentation
Jianbo Shi,Jitendra Malik +1 more
TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Journal ArticleDOI
Finding and evaluating community structure in networks.
TL;DR: It is demonstrated that the algorithms proposed are highly effective at discovering community structure in both computer-generated and real-world network data, and can be used to shed light on the sometimes dauntingly complex structure of networked systems.
Proceedings ArticleDOI
Normalized cuts and image segmentation
Jianbo Shi,Jitendra Malik +1 more
TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.