Unsupervised graph-based topic labelling using dbpedia
Ioana Hulpus,Conor Hayes,Marcel Karnstedt,Derek Greene +3 more
- pp 465-474
TLDR
In this paper, a graph-based approach for topic labeling is proposed, based on the DBpedia graph, which identifies the concepts that best represent the topics and uses graph centrality measures to identify the topics.Abstract:
Automated topic labelling brings benefits for users aiming at analysing and understanding document collections, as well as for search engines targetting at the linkage between groups of words and their inherent topics. Current approaches to achieve this suffer in quality, but we argue their performances might be improved by setting the focus on the structure in the data. Building upon research for concept disambiguation and linking to DBpedia, we are taking a novel approach to topic labelling by making use of structured data exposed by DBpedia. We start from the hypothesis that words co-occuring in text likely refer to concepts that belong closely together in the DBpedia graph. Using graph centrality measures, we show that we are able to identify the concepts that best represent the topics. We comparatively evaluate our graph-based approach and the standard text-based approach, on topics extracted from three corpora, based on results gathered in a crowd-sourcing experiment. Our research shows that graph-based analysis of DBpedia can achieve better results for topic labelling in terms of both precision and topic coverage.read more
Citations
More filters
Proceedings ArticleDOI
Knowledge-based graph document modeling
TL;DR: This work proposes a graph-based semantic model for representing document content that combines DBpedia's structure with an information-theoretic measure of concept association, based on its explicit semantic relations, and achieves a performance close to that of highly specialized methods that have been tuned to these specific tasks.
Journal ArticleDOI
Information extraction meets the semantic web: a survey
TL;DR: Millennium Institute for Foundational Research on Data (IMFD) Comision Nacional de Investigacion Cientifica y Tecnologica (CONICYT), CONICyT FONDECYT: 1181896
Book ChapterDOI
Path-Based Semantic Relatedness on Linked Data and Its Use to Word and Entity Disambiguation
TL;DR: This paper shows that semantic relatedness can also be accurately computed by analysing only the graph structure of the knowledge base, and proposes a joint approach to entity and word-sense disambiguation that makes use of graph-based relatedness.
Posted Content
Automatic Labelling of Topics with Neural Embeddings
TL;DR: This work proposes labelling a topic with a succinct phrase that summarises its theme or idea, using Wikipedia document titles as label candidates and compute neural embeddings for documents and words to select the most relevant labels for topics.
Book ChapterDOI
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
Besnik Fetahu,Stefan Dietze,Bernardo Pereira Nunes,Marco A. Casanova,Davide Taibi,Wolfgang Nejdl +5 more
TL;DR: This work proposes an approach for creating linked dataset profiles that generates accurate profiles even with comparably small sample sizes and outperforms established topic modelling approaches.
References
More filters
Journal ArticleDOI
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Book
Networks: An Introduction
TL;DR: This book brings together for the first time the most important breakthroughs in each of these fields and presents them in a coherent fashion, highlighting the strong interconnections between work in different areas.
Proceedings Article
Probabilistic latent semantic analysis
TL;DR: This work proposes a widely applicable generalization of maximum likelihood model fitting by tempered EM, based on a mixture decomposition derived from a latent class model which results in a more principled approach which has a solid foundation in statistics.
Journal ArticleDOI
A measure of betweenness centrality based on random walks
TL;DR: In this paper, the authors propose a measure of betweenness based on random walks, counting how often a node is traversed by a random walk between two other nodes, not just the shortest paths.