scispace - formally typeset
Search or ask a question
Author

Götz Fabian

Bio: Götz Fabian is an academic researcher from Dresden University of Technology. The author has contributed to research in topics: Process ontology & Ontology components. The author has an hindex of 2, co-authored 2 publications receiving 25 citations.

Papers
More filters
Proceedings ArticleDOI
07 Dec 2011
TL;DR: A demo of the capabilities of DOG4DAG, the Dresden Ontology Generator for Directed Acyclic Graphs, which is available as plugin to both OBO-Edit and Protégé, and summarise the strengths and limits of the different the steps of the generation process.
Abstract: In the biomedical domain, Protege and OBO-Edit are the main ontology editors supporting the manual construction of ontologies. Since manual creation is a laborious and hence costly process, there have been efforts to automate parts of this process. Here, we give a demo of the capabilities of DOG4DAG, the Dresden Ontology Generator for Directed Acyclic Graphs, which is available as plugin to both OBO-Edit and Protege. In the demo, we describe how to generate terms and in particular siblings, definitions, and is-a relationships using an example in the domain of nervous system diseases. We summarise the strengths and limits of the different the steps of the generation process.

17 citations

Journal ArticleDOI
TL;DR: The method for sibling discovery is able to suggest first-class ontology terms and can be used as an initial step towards assessing the completeness of ontologies, and for MeSH in particular, it can be considered complete in its medical focus area.
Abstract: Motivation: Ontologies are an everyday tool in biomedicine to capture and represent knowledge. However, many ontologies lack a high degree of coverage in their domain and need to improve their overall quality and maturity. Automatically extending sets of existing terms will enable ontology engineers to systematically improve text-based ontologies level by level. Results: We developed an approach to extend ontologies by discovering new terms which are in a sibling relationship to existing terms of an ontology. For this purpose, we combined two approaches which retrieve new terms from the web. The first approach extracts siblings by exploiting the structure of HTML documents, whereas the second approach uses text mining techniques to extract siblings from unstructured text. Our evaluation against MeSH (Medical Subject Headings) shows that our method for sibling discovery is able to suggest first-class ontology terms and can be used as an initial step towards assessing the completeness of ontologies. The evaluation yields a recall of 80% at a precision of 61% where the two independent approaches are complementing each other. For MeSH in particular, we show that it can be considered complete in its medical focus area. We integrated the work into DOG4DAG, an ontology generation plugin for the editors OBO-Edit and Protege, making it the first plugin that supports sibling discovery on-the-fly. Availability: Sibling discovery for ontology is available as part of DOG4DAG ( www.biotec.tu-dresden.de/research/schroeder/dog4dag) for both Protege 4.1 and OBO-Edit 2.1. Contact:ms@biotec.tu-dresden.de; goetz.fabian@biotec.tu-dresden.de Supplementary information:Supplementary data are available at Bioinformatics online.

12 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper synthesizes the extant literature in NLP in accounting, auditing and finance to establish the state of current knowledge and to identify paths for future research.
Abstract: Natural language processing NLP is a part of the artificial intelligence domain focused on communication between humans and computers. NLP attempts to address the inherent problem that while human communications are often ambiguous and imprecise, computers require unambiguous and precise messages to enable understanding. The accounting, auditing and finance domains frequently put forth textual documents intended to communicate a wide variety of messages, including, but not limited to, corporate financial performance, management's assessment of current and future firm performance, analysts' assessments of firm performance, domain standards and regulations as well as evidence of compliance with relevant standards and regulations. NLP applications have been used to mine these documents to obtain insights, make inferences and to create additional methodologies and artefacts to advance knowledge in accounting, auditing and finance. This paper synthesizes the extant literature in NLP in accounting, auditing and finance to establish the state of current knowledge and to identify paths for future research. Copyright © 2016 John Wiley & Sons, Ltd.

123 citations

Journal ArticleDOI
TL;DR: A descriptive study of the current extent of term reuse and overlap among biomedical ontologies, which stipulate the need for semi-automated tools that augment term reuse in the ontology engineering process through personalized recommendations.
Abstract: Reusing ontologies and their terms is a principle and best practice that most ontology development methodologies strongly encourage. Reuse comes with the promise to support the semantic interoperability and to reduce engineering costs. In this paper, we present a descriptive study of the current extent of term reuse and overlap among biomedical ontologies. We use the corpus of biomedical ontologies stored in the BioPortal repository, and analyze different types of reuse and overlap constructs. While we find an approximate term overlap between 25-31%, the term reuse is only 90% semantic similarity, hinting that ontology developers tend to reuse terms that are sibling or parent-child nodes. We validate this finding by analysing the logs generated from a Protege plugin that enables developers to reuse terms from BioPortal. We find most reuse constructs were 2-level subtrees on the higher levels of the class hierarchy. We developed a Web application that visualizes reuse dependencies and overlap among ontologies, and that proposes similar terms from BioPortal for a term of interest. We also identified a set of error patterns that indicate that ontology developers did intend to reuse terms from other ontologies, but that they were using different and sometimes incorrect representations. Our results stipulate the need for semi-automated tools that augment term reuse in the ontology engineering process through personalized recommendations.

60 citations

Journal ArticleDOI
TL;DR: A method for the automated assignment of additional classes to patent documents is evaluated, and a system for guided patent search based on the use of class co-occurrence information and external resources is proposed.
Abstract: Document search on PubMed, the pre-eminent database for biomedical literature, relies on the annotation of its documents with relevant terms from the Medical Subject Headings ontology (MeSH) for improving recall through query expansion. Patent documents are another important information source, though they are considerably less accessible. One option to expand patent search beyond pure keywords is the inclusion of classification information: Since every patent is assigned at least one class code, it should be possible for these assignments to be automatically used in a similar way as the MeSH annotations in PubMed. In order to develop a system for this task, it is necessary to have a good understanding of the properties of both classification systems. This report describes our comparative analysis of MeSH and the main patent classification system, the International Patent Classification (IPC). We investigate the hierarchical structures as well as the properties of the terms/classes respectively, and we compare the assignment of IPC codes to patents with the annotation of PubMed documents with MeSH terms. Our analysis shows a strong structural similarity of the hierarchies, but significant differences of terms and annotations. The low number of IPC class assignments and the lack of occurrences of class labels in patent texts imply that current patent search is severely limited. To overcome these limits, we evaluate a method for the automated assignment of additional classes to patent documents, and we propose a system for guided patent search based on the use of class co-occurrence information and external resources.

51 citations

Journal ArticleDOI
17 Jan 2013
TL;DR: The purpose of this collection was to compile a bibliography which could be of help to students and young researchers, and to collect references concerning the handling of time and evolution issues in Semantic Web research.
Abstract: Time is a pervasive dimension of reality as everything evolves as time elapses. Therefore, Web-based information systems and knowledge representation tools at least mirror, and often have to capture, the time-varying and evolutionary nature of the phenomena they model and of the activities they support. This aspect has been acknowledged and long studied in the field of temporal databases [Jensen and Snodgrass 2009] but it truly applies also to the World Wide Web and Semantic Web in particular. Several papers addressing, in an explicit or implicit way, the representation and management of time and evolution in the Semantic Web appeared recently and, on some aspects, showed a clear upward trend in last years, witnessing a sustained and/or growing research interest. Reflecting and acknowledging such interest, we started in 2011 to collect references concerning the handling of time and evolution issues in Semantic Web research. As it was for [Grandi 2003], the purpose of this collection was to compile a bibliography which could be of help, in particular, to students and young researchers. As a result of such almost endless work, we wrote an annotated bibliography [Grandi 2012], whose latest version is available on the Web at URL:

33 citations

Journal ArticleDOI
TL;DR: In this review, more than 90 relevant research studies have been analyzed, describing the most important practical applications, terminological resources, tools, and open challenges of TM in medicine.
Abstract: Health care professionals produce abundant textual information in their daily clinical practice and this information is stored in many diverse sources and, generally, in textual form. The extraction of insights from all the gathered information, mainly unstructured and lacking normalization, is one of the major challenges in computational medicine. In this respect, text mining (TM) assembles different techniques to derive valuable insights from unstructured textual data so it has led to be especially relevant in medicine. The aim of this paper is therefore to provide an extensive review of existing techniques and resources to perform TM tasks in medicine. In this review, more than 90 relevant research studies have been analyzed, describing the most important practical applications, terminological resources, tools, and open challenges of TM in medicine.

33 citations