"SINA: semantic interpretation of user queries for question answering on interlinked data" by Saeedeh Shekarpour with Prateek Jain as coordinator

doi:10.1145/2641730.2641733

Journal ArticleDOI

"SINA: semantic interpretation of user queries for question answering on interlinked data" by Saeedeh Shekarpour with Prateek Jain as coordinator

Saeedeh Shekarpour, +1 more

- 01 Jul 2014 -

ACM Sigweb Newsletter

- Vol. 2014, pp 3

Chats0

TLDR

This dissertation presents a question answering system, which transforms user supplied queries into conjunctive SPARQL queries over a set of interlinked data sources, and employs a hidden Markov model, whose parameters were bootstrapped with different distribution functions.

Abstract:

The Data Web contains a wealth of knowledge on a large number of domains. Question answering over interlinked data sources is challenging due to two inherent characteristics. First, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain question. Second, constructing a federated formal query across different datasets requires exploiting links between the different datasets on both the schema and instance levels. In this dissertation, we present a question answering system, which transforms user supplied queries (i.e. either natural language sentences or keywords) into conjunctive SPARQL queries over a set of interlinked data sources. The contribution of this work is as follows: 1. A novel approach for determining the most suitable resources for a user-supplied query from different datasets (disambiguation). We employ a hidden Markov model, whose parameters were bootstrapped with different distribution functions. 2. A novel method for constructing a federated formal queries using the disambiguated resources and leveraging the linking structure of the underlying datasets. This approach essentially relies on a combination of domain and range inference as well as a link traversal method for constructing a connected graph which ultimately renders a corresponding SPARQL query. The results of our evaluation with three life-science datasets and 25 benchmark queries demonstrate the effectiveness of our approach by achieving 100% precision on QALD-1 and are able to perform as well as the best question answering system from the QALD-3 competition by answering 32 questions correctly.