Unsupervised graph-based topic labelling using dbpedia

doi:10.1145/2433396.2433454

Open AccessProceedings ArticleDOI

Unsupervised graph-based topic labelling using dbpedia

- pp 465-474

TLDR

In this paper, a graph-based approach for topic labeling is proposed, based on the DBpedia graph, which identifies the concepts that best represent the topics and uses graph centrality measures to identify the topics.

Abstract:

Automated topic labelling brings benefits for users aiming at analysing and understanding document collections, as well as for search engines targetting at the linkage between groups of words and their inherent topics. Current approaches to achieve this suffer in quality, but we argue their performances might be improved by setting the focus on the structure in the data. Building upon research for concept disambiguation and linking to DBpedia, we are taking a novel approach to topic labelling by making use of structured data exposed by DBpedia. We start from the hypothesis that words co-occuring in text likely refer to concepts that belong closely together in the DBpedia graph. Using graph centrality measures, we show that we are able to identify the concepts that best represent the topics. We comparatively evaluate our graph-based approach and the standard text-based approach, on topics extracted from three corpora, based on results gathered in a crowd-sourcing experiment. Our research shows that graph-based analysis of DBpedia can achieve better results for topic labelling in terms of both precision and topic coverage.

Figures

Figure 5: Precision and Coverage (y axis) @top-k (x axis) for the three corpora.

Figure 6: Influence of number of seed nodes

Table 2: Correlation of the focused centrality measures

Figure 3: Evolution from topic words to candidate labels.

Table 1: Example top-3 labels for topic: [atom, energy, electron, quantum, classic, orbit, particle]

Figure 1: Canopy framework for automated topic analysis

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Knowledge-based graph document modeling

Michael Schuhmacher, +1 more

TL;DR: This work proposes a graph-based semantic model for representing document content that combines DBpedia's structure with an information-theoretic measure of concept association, based on its explicit semantic relations, and achieves a performance close to that of highly specialized methods that have been tuned to these specific tasks.

...read moreread less

Journal ArticleDOI

Information extraction meets the semantic web: a survey

José-Lázaro Martínez-Rodríguez, +2 more

- 01 Jan 2020 -

Social Work

TL;DR: Millennium Institute for Foundational Research on Data (IMFD) Comision Nacional de Investigacion Cientifica y Tecnologica (CONICYT), CONICyT FONDECYT: 1181896

...read moreread less

Book ChapterDOI

Path-Based Semantic Relatedness on Linked Data and Its Use to Word and Entity Disambiguation

Ioana Hulpus, +2 more

TL;DR: This paper shows that semantic relatedness can also be accurately computed by analysing only the graph structure of the knowledge base, and proposes a joint approach to entity and word-sense disambiguation that makes use of graph-based relatedness.

...read moreread less

Posted Content

Automatic Labelling of Topics with Neural Embeddings

Shraey Bhatia, +2 more

- 16 Dec 2016 -

arXiv: Computation and Language

TL;DR: This work proposes labelling a topic with a succinct phrase that summarises its theme or idea, using Wikipedia document titles as label candidates and compute neural embeddings for documents and words to select the most relevant labels for topics.

...read moreread less

Book ChapterDOI

A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles

Besnik Fetahu, +5 more

TL;DR: This work proposes an approach for creating linked dataset profiles that generates accurate profiles even with comparably small sample sizes and outperforms established topic modelling approaches.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Book

Networks: An Introduction

Mark Newman

TL;DR: This book brings together for the first time the most important breakthroughs in each of these fields and presents them in a coherent fashion, highlighting the strong interconnections between work in different areas.

...read moreread less

Proceedings Article

Probabilistic latent semantic analysis

Thomas Hofmann

TL;DR: This work proposes a widely applicable generalization of maximum likelihood model fitting by tempered EM, based on a mixture decomposition derived from a latent class model which results in a more principled approach which has a solid foundation in statistics.

...read moreread less

Journal ArticleDOI

A measure of betweenness centrality based on random walks

Mark Newman

- 01 Jan 2005 -

Social Networks

TL;DR: In this paper, the authors propose a measure of betweenness based on random walks, counting how often a node is traversed by a random walk between two other nodes, not just the shortest paths.

...read moreread less

Related Papers (5)

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

DBpedia - A crystallization point for the Web of Data

Christian Bizer, +6 more

- 01 Sep 2009 -

Journal of Web Semantics

Unsupervised graph-based topic labelling using dbpedia

Figures

Citations

Knowledge-based graph document modeling

Information extraction meets the semantic web: a survey

Path-Based Semantic Relatedness on Linked Data and Its Use to Word and Entity Disambiguation

Automatic Labelling of Topics with Neural Embeddings

A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles

References

Latent dirichlet allocation

Latent Dirichlet Allocation

Networks: An Introduction

Probabilistic latent semantic analysis

A measure of betweenness centrality based on random walks

Related Papers (5)

Latent dirichlet allocation

DBpedia - A crystallization point for the Web of Data

Probabilistic latent semantic indexing

DBpedia spotlight: shedding light on the web of documents

Computing semantic relatedness using Wikipedia-based explicit semantic analysis