scispace - formally typeset
Open AccessProceedings ArticleDOI

Unsupervised graph-based topic labelling using dbpedia

TLDR
In this paper, a graph-based approach for topic labeling is proposed, based on the DBpedia graph, which identifies the concepts that best represent the topics and uses graph centrality measures to identify the topics.
Abstract
Automated topic labelling brings benefits for users aiming at analysing and understanding document collections, as well as for search engines targetting at the linkage between groups of words and their inherent topics. Current approaches to achieve this suffer in quality, but we argue their performances might be improved by setting the focus on the structure in the data. Building upon research for concept disambiguation and linking to DBpedia, we are taking a novel approach to topic labelling by making use of structured data exposed by DBpedia. We start from the hypothesis that words co-occuring in text likely refer to concepts that belong closely together in the DBpedia graph. Using graph centrality measures, we show that we are able to identify the concepts that best represent the topics. We comparatively evaluate our graph-based approach and the standard text-based approach, on topics extracted from three corpora, based on results gathered in a crowd-sourcing experiment. Our research shows that graph-based analysis of DBpedia can achieve better results for topic labelling in terms of both precision and topic coverage.

read more

Citations
More filters
Proceedings ArticleDOI

Knowledge-based graph document modeling

TL;DR: This work proposes a graph-based semantic model for representing document content that combines DBpedia's structure with an information-theoretic measure of concept association, based on its explicit semantic relations, and achieves a performance close to that of highly specialized methods that have been tuned to these specific tasks.
Journal ArticleDOI

Information extraction meets the semantic web: a survey

TL;DR: Millennium Institute for Foundational Research on Data (IMFD) Comision Nacional de Investigacion Cientifica y Tecnologica (CONICYT), CONICyT FONDECYT: 1181896
Book ChapterDOI

Path-Based Semantic Relatedness on Linked Data and Its Use to Word and Entity Disambiguation

TL;DR: This paper shows that semantic relatedness can also be accurately computed by analysing only the graph structure of the knowledge base, and proposes a joint approach to entity and word-sense disambiguation that makes use of graph-based relatedness.
Posted Content

Automatic Labelling of Topics with Neural Embeddings

TL;DR: This work proposes labelling a topic with a succinct phrase that summarises its theme or idea, using Wikipedia document titles as label candidates and compute neural embeddings for documents and words to select the most relevant labels for topics.
Book ChapterDOI

A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles

TL;DR: This work proposes an approach for creating linked dataset profiles that generates accurate profiles even with comparably small sample sizes and outperforms established topic modelling approaches.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Book

Networks: An Introduction

Mark Newman
TL;DR: This book brings together for the first time the most important breakthroughs in each of these fields and presents them in a coherent fashion, highlighting the strong interconnections between work in different areas.
Proceedings Article

Probabilistic latent semantic analysis

TL;DR: This work proposes a widely applicable generalization of maximum likelihood model fitting by tempered EM, based on a mixture decomposition derived from a latent class model which results in a more principled approach which has a solid foundation in statistics.
Journal ArticleDOI

A measure of betweenness centrality based on random walks

TL;DR: In this paper, the authors propose a measure of betweenness based on random walks, counting how often a node is traversed by a random walk between two other nodes, not just the shortest paths.
Related Papers (5)