scispace - formally typeset
Search or ask a question
Author

Didier Cherix

Bio: Didier Cherix is an academic researcher from Leipzig University. The author has contributed to research in topics: Semantic Web & Question answering. The author has an hindex of 5, co-authored 7 publications receiving 294 citations.

Papers
More filters
Proceedings ArticleDOI
18 May 2015
TL;DR: GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results.
Abstract: We present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. By these means, we aim to ensure that both tool developers and end users can derive meaningful insights pertaining to the extension, integration and use of annotation applications. In particular, GERBIL provides comparable results to tool developers so as to allow them to easily discover the strengths and weaknesses of their implementations with respect to the state of the art. With the permanent experiment URIs provided by our framework, we ensure the reproducibility and archiving of evaluation results. Moreover, the framework generates data in machine-processable format, allowing for the efficient querying and post-processing of evaluation results. Finally, the tool diagnostics provided by GERBIL allows deriving insights pertaining to the areas in which tools should be further refined, thus allowing developers to create an informed agenda for extensions and end users to detect the right tools for their purposes. GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results.

219 citations

Book ChapterDOI
29 May 2016
TL;DR: This work provides an approach driven by a core QA vocabulary that is aligned to existing, powerful ontologies provided by domain-specific communities, and is agnostic to implementation details and that inherently follows the linked data principles.
Abstract: It is very challenging to access the knowledge expressed within big data sets. Question answering QA aims at making sense out of data via a simple-to-use interface. However, QA systems are very complex and earlier approaches are mostly singular and monolithic implementations for QA in specific domains. Therefore, it is cumbersome and inefficient to design and implement new or improved approaches, in particular as many components are not reusable. Hence, there is a strong need for enabling best-of-breed QA systems, where the best performing components are combined, aiming at the best quality achievable in the given domain. Taking into account the high variety of functionality that might be of use within a QA system and therefore reused in new QA systems, we provide an approach driven by a core QA vocabulary that is aligned to existing, powerful ontologies provided by domain-specific communities. We achieve this by a methodology for binding existing vocabularies to our core QA vocabulary without re-creating the information provided by external components. We thus provide a practical approach for rapidly establishing new domain-specific QA systems, while the core QA vocabulary is re-usable across multiple domains. To the best of our knowledge, this is the first approach to open QA systems that is agnostic to implementation details and that inherently follows the linked data principles.

44 citations

Book ChapterDOI
29 May 2016
TL;DR: This work follows the research agenda of establishing an ecosystem for components of QA systems, which will enable the QA community to elevate the reusability of such components and to intensify their research activities.
Abstract: Question answering (QA) systems focus on making sense out of data via an easy-to-use interface. However, these systems are very complex and integrate a lot of technology tightly. Previously presented QA systems are mostly singular and monolithic implementations. Hence, their reusability is limited. In contrast, we follow the research agenda of establishing an ecosystem for components of QA systems, which will enable the QA community to elevate the reusability of such components and to intensify their research activities.

37 citations

Book ChapterDOI
05 Jun 2017
TL;DR: The main goal is to show how the research community can use Qanary to gain new insights into QA processes, and illustrate this by focusing on the Entity Linking task w.r.t. textual natural language input, which is a fundamental step in mostQA processes.
Abstract: The field of Question Answering (QA) is very multi-disciplinary as it requires expertise from a large number of areas such as natural language processing (NLP), artificial intelligence, machine learning, information retrieval, speech recognition and semantic technologies In the past years a large number of QA systems were proposed using approaches from different fields and focusing on particular tasks in the QA process Unfortunately, most of these systems cannot be easily reused, extended, and results cannot be easily reproduced since the systems are mostly implemented in a monolithic fashion, lack standardized interfaces and are often not open source or available as Web services To address these issues we developed the knowledge-based Qanary methodology for choreographing QA pipelines distributed over the Web Qanary employs the qa vocabulary as an exchange format for typical QA components As a result, QA systems can be built using the Qanary methodology in a simpler, more flexible and standardized way while becoming knowledge-driven instead of being process-oriented This paper presents the components and services that are integrated using the qa vocabulary and the Qanary methodology within the Qanary ecosystem Moreover, we show how the Qanary ecosystem can be used to analyse QA processes to detect weaknesses and research gaps We illustrate this by focusing on the Entity Linking (EL) task wrt textual natural language input, which is a fundamental step in most QA processes Additionally, we contribute the first EL benchmark for QA, as open source Our main goal is to show how the research community can use Qanary to gain new insights into QA processes

23 citations

01 Jan 2014
TL;DR: This system provides a semi-automatic approach for instance-level error detection in ontologies which is agnostic of the underlying Linked Data knowledge base and works at very low costs.
Abstract: Over the past years, a vast number of datasets have been published based on Semantic Web standards, which provides an opportunity for creating novel industrial applications. However, industrial requirements on data quality are high while the time to market as well as the required costs for data preparation have to be kept low. Unfortunately, many Linked Data sources are error-prone which prevents their direct use in productive systems. Hence, (semi-)automatic quality assurance processes are needed as manual ontology repair procedures by domain experts are expensive and time consuming. In this article, we present CROCUS – a pipeline for cluster-based ontology data cleansing. Our system provides a semi-automatic approach for instance-level error detection in ontologies which is agnostic of the underlying Linked Data knowledge base and works at very low costs. CROCUS was evaluated on two datasets. The experiments show that we are able to detect errors with high recall.

7 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: An in-depth overview of the current state-of-the-art of aspect-level sentiment analysis is given, showing the tremendous progress that has been made in finding both the target, which can be an entity as such, or some aspect of it, and the corresponding sentiment.
Abstract: The field of sentiment analysis, in which sentiment is gathered, analyzed, and aggregated from text, has seen a lot of attention in the last few years. The corresponding growth of the field has resulted in the emergence of various subareas, each addressing a different level of analysis or research question. This survey focuses on aspect-level sentiment analysis, where the goal is to find and aggregate sentiment on entities mentioned within documents or aspects of them. An in-depth overview of the current state-of-the-art is given, showing the tremendous progress that has already been made in finding both the target, which can be an entity as such, or some aspect of it, and the corresponding sentiment. Aspect-level sentiment analysis yields very fine-grained sentiment information which can be useful for applications in various domains. Current solutions are categorized based on whether they provide a method for aspect detection, sentiment analysis, or both. Furthermore, a breakdown based on the type of algorithm used is provided. For each discussed study, the reported performance is included. To facilitate the quantitative evaluation of the various proposed methods, a call is made for the standardization of the evaluation methodology that includes the use of shared data sets. Semantically-rich concept-centric aspect-level sentiment analysis is discussed and identified as one of the most promising future research direction.

579 citations

Journal ArticleDOI
TL;DR: An overview of the techniques used in current QA systems over KBs is given, which were evaluated on a popular series of benchmarks: Question Answering over Linked Data and WebQuestions.
Abstract: The Semantic Web contains an enormous amount of information in the form of knowledge bases (KB). To make this information available, many question answering (QA) systems over KBs were created in the last years. Building a QA system over KBs is difficult because there are many different challenges to be solved. In order to address these challenges, QA systems generally combine techniques from natural language processing, information retrieval, machine learning and Semantic Web. The aim of this survey is to give an overview of the techniques used in current QA systems over KBs. We present the techniques used by the QA systems which were evaluated on a popular series of benchmarks: Question Answering over Linked Data. Techniques that solve the same task are first grouped together and then described. The advantages and disadvantages are discussed for each technique. This allows a direct comparison of similar techniques. Additionally, we point to techniques that are used over WebQuestions and SimpleQuestions, which are two other popular benchmarks for QA systems.

268 citations

Proceedings ArticleDOI
03 Apr 2017
TL;DR: This work trains a neural network for answering simple questions in an end-to-end manner, leaving all decisions to the model, which contains a nested word/character-level question encoder which allows to handle out-of-vocabulary and rare word problems while still being able to exploit word-level semantics.
Abstract: Question Answering (QA) systems over Knowledge Graphs (KG) automatically answer natural language questions using facts contained in a knowledge graph. Simple questions, which can be answered by the extraction of a single fact, constitute a large part of questions asked on the web but still pose challenges to QA systems, especially when asked against a large knowledge resource. Existing QA systems usually rely on various components each specialised in solving different sub-tasks of the problem (such as segmentation, entity recognition, disambiguation, and relation classification etc.). In this work, we follow a quite different approach: We train a neural network for answering simple questions in an end-to-end manner, leaving all decisions to the model. It learns to rank subject-predicate pairs to enable the retrieval of relevant facts given a question. The network contains a nested word/character-level question encoder which allows to handle out-of-vocabulary and rare word problems while still being able to exploit word-level semantics. Our approach achieves results competitive with state-of-the-art end-to-end approaches that rely on an attention mechanism.

248 citations

Journal ArticleDOI
TL;DR: This survey analyzes 62 different SQA systems, which are systematically and manually selected using predefined inclusion and exclusion criteria, leading to 72 selected publications out of 1960 candidates, and identifies common challenges, structure solutions, and provide recommendations for future systems.
Abstract: Semantic Question Answering (SQA) removes two major access requirements to the Semantic Web: the mastery of a formal query language like SPARQL and knowledge of a specific vocabulary. Because of the complexity of natural language, SQA presents difficult challenges and many research opportunities. Instead of a shared effort, however, many essential components are redeveloped, which is an inefficient use of researcher’s time and resources. This survey analyzes 62 different SQA systems, which are systematically and manually selected using predefined inclusion and exclusion criteria, leading to 72 selected publications out of 1960 candidates. We identify common challenges, structure solutions, and provide recommendations for future systems. This work is based on publications from the end of 2010 to July 2015 and is also compared to older but similar surveys.

205 citations

Book ChapterDOI
11 Oct 2015
TL;DR: TabEL differs from previous work by weakening the assumption that the semantics of a table can be mapped to pre-defined types and relations found in the target KB, and enforces soft constraints in the form of a graphical model that assigns higher likelihood to sets of entities that tend to co-occur in Wikipedia documents and tables.
Abstract: Web tables form a valuable source of relational data. The Web contains an estimated 154 million HTML tables of relational data, with Wikipedia alone containing 1.6 million high-quality tables. Extracting the semantics of Web tables to produce machine-understandable knowledge has become an active area of research. A key step in extracting the semantics of Web content is entity linking EL: the task of mapping a phrase in text to its referent entity in a knowledge base KB. In this paper we present TabEL, a new EL system for Web tables. TabEL differs from previous work by weakening the assumption that the semantics of a table can be mapped to pre-defined types and relations found in the target KB. Instead, TabEL enforces soft constraints in the form of a graphical model that assigns higher likelihood to sets of entities that tend to co-occur in Wikipedia documents and tables. In experiments, TabEL significantly reduces error when compared to current state-of-the-art table EL systems, including a $$75\%$$ error reduction on Wikipedia tables and a $$60\%$$ error reduction on Web tables. We also make our parsed Wikipedia table corpus and test datasets publicly available for future work.

162 citations