scispace - formally typeset
Search or ask a question
Proceedings Article

NERD: A framework for evaluating named entity recognition tools in the Web of data

01 Jan 2011-pp 1-4
TL;DR: NERD enables the comparison of different popular Linked Data entity extractors which expose APIs such as AlchemyAPI, DBPedia Spotlight, Extractiv, OpenCalais and Zemanta.
Abstract: In this paper, we present NERD, an evaluation framework we have developed that records and analyzes ratings of Named Entity (NE) extraction and disambiguation tools working on English plain text articles performed by human beings. NERD enables the comparison of different popular Linked Data entity extractors which expose APIs such as AlchemyAPI, DBPedia Spotlight, Extractiv, OpenCalais and Zemanta. Given an article and a particular tool, a user can assess the precision of the named entities extracted, their typing and linked data URI provided for disambiguation and their subjective relevance for the text. All user interactions are stored in a database. We propose the NERD ontology that defines mappings between the types detected by the different NE extractors. The NERD framework enables then to visualize the comparative performance of these tools with respect to human assessment.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article
23 Apr 2012
TL;DR: NERD is proposed, a framework which unifies 10 popular named entity extractors available on the web, and the NERD ontology which provides a rich set of axioms aligning the taxonomies of these tools.
Abstract: Named Entity Extraction is a mature task in the NLP field that has yielded numerous services gaining popularity in the Semantic Web community for extracting knowledge from web documents. These services are generally organized as pipelines, using dedicated APIs and different taxonomy for extracting, classifying and disambiguating named entities. Integrating one of these services in a particular application requires to implement an appropriate driver. Furthermore, the results of these services are not comparable due to different formats. This prevents the comparison of the performance of these services as well as their possible combination. We address this problem by proposing NERD, a framework which unifies 10 popular named entity extractors available on the web, and the NERD ontology which provides a rich set of axioms aligning the taxonomies of these tools.

130 citations


Cites background or methods from "NERD: A framework for evaluating na..."

  • ...This set of evaluations is further used to compute statistics about precision scores for each tool, with the goal to highlight strengths and weaknesses and to compare them (Rizzo and Troncy, 2011b)....

    [...]

  • ...The main purpose of this interface is to enable a human user to assess the quality of the extraction results collected by those tools (Rizzo and Troncy, 2011a)....

    [...]

Journal ArticleDOI
TL;DR: An effort to map the current research topics in Twitter focusing on three major areas: the structure and properties of the social graph, sentiment analysis and threats such as spam, bots, fake news and hate speech is presented.
Abstract: Twitter is the third most popular worldwide Online Social Network (OSN) after Facebook and Instagram. Compared to other OSNs, it has a simple data model and a straightforward data access API. This makes it ideal for social network studies attempting to analyze the patterns of online behavior, the structure of the social graph, the sentiment towards various entities and the nature of malicious attacks in a vivid network with hundreds of millions of users. Indeed, Twitter has been established as a major research platform, utilized in more than ten thousands research articles over the last ten years. Although there are excellent review and comparison studies for most of the research that utilizes Twitter, there are limited efforts to map this research terrain as a whole. Here we present an effort to map the current research topics in Twitter focusing on three major areas: the structure and properties of the social graph, sentiment analysis and threats such as spam, bots, fake news and hate speech. We also present Twitter’s basic data model and best practices for sampling and data access. This survey also lays the ground of computational techniques used in these areas such as Graph Sampling, Natural Language Processing and Machine Learning. Along with existing reviews and comparison studies, we also discuss the key findings and the state of the art in these methods. Overall, we hope that this survey will help researchers create a clear conceptual model of Twitter and act as a guide to expand further the topics presented.

118 citations

01 Jan 2012
TL;DR: NERD, an API and a front-end user interface powered by an ontology to unify various named entity extractors is developed and serialized in RDF according to the NIF specication and published back on the Linked Data cloud.
Abstract: We have often heard that data is the new oil. In particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. Several attempts have been proposed to extract knowledge from textual documents, extracting named entities, classifying them according to pre-dened taxonomies and disambiguating them through URIs identifying real world entities. As a step towards interconnecting the Web of documents via those entities, dierent extractors have been proposed. Although they share the same main purpose (extracting named entity), they dier from numerous aspects such as their underlying dictionary or ability to disambiguate entities. We have developed NERD, an API and a front-end user interface powered by an ontology to unify various named entity extractors. The unied result output is serialized in RDF according to the NIF specication and published back on the Linked Data cloud. We evaluated NERD with a dataset composed of ve TED talk transcripts, a dataset composed of 1000 New York Times articles and a dataset composed of the 217 abstracts of the papers published at WWW 2011.

64 citations

Journal ArticleDOI
TL;DR: The potential of Twitter to aid in better understanding the impact of the natural environment on human health and wellbeing is shown, and recommendations for the better dissemination of public health messages through changes to the framing of messages are made.
Abstract: Evidence continues to grow supporting the idea that restorative environments, green exercise, and nature-based activities positively impact human health. Nature-deficit disorder, a journalistic term proposed to describe the ill effects of people’s alienation from nature, is not yet formally recognized as a medical diagnosis. However, over the past decade, the phrase has been enthusiastically taken up by some segments of the lay public. Social media, such as Twitter, with its opportunities to gather “big data” related to public opinions, offers a medium for exploring the discourse and dissemination around nature-deficit disorder and other nature–health concepts. In this paper, we report our experience of collecting more than 175,000 tweets, applying sentiment analysis to measure positive, neutral or negative feelings, and preliminarily mapping the impact on dissemination. Sentiment analysis is currently used to investigate the repercussions of events in social networks, scrutinize opinions about products and services, and understand various aspects of the communication in Web-based communities. Based on a comparison of nature-deficit-disorder “hashtags” and more generic nature hashtags, we make recommendations for the better dissemination of public health messages through changes to the framing of messages. We show the potential of Twitter to aid in better understanding the impact of the natural environment on human health and wellbeing.

51 citations


Cites background from "NERD: A framework for evaluating na..."

  • ...[45] have also validated the performance of AlchemyAPI on a number of datasets and in different contexts: Rizzo and Troncy found out that AlchemyAPI is better at extracting named entities and categorizing them than other semantic entity extractors [44]—such as, Zemanta [46], OpenCalais [47], Extractiv [48] and DBpedia Spotlight [49]....

    [...]

Proceedings ArticleDOI
16 Jul 2012
TL;DR: This paper presents the RDFaCE approach for combining WYSIWYG text authoring with the creation of rich semantic annotations, and evaluates their accuracy and empirically shows that a combination of them yields superior results.
Abstract: Recently practical approaches for managing and supporting the life-cycle of semantic content on the Web of Data made quite some progress. However, the currently least developed aspect of the semantic content life-cycle is the user-friendly manual and semi-automatic creation of rich semantic content. In this paper we present the RDFaCE approach for combining WYSIWYG text authoring with the creation of rich semantic annotations. Our approach is based on providing four different views to the content authors: a classical WYSIWYG view, a WYSIWYM (What You See Is What You Mean) view making the semantic annotations visible, a fact view and the respective HTML/RDFa source code view. The views are synchronized such that changes made in one of the views automatically update the others. They provide different means of semantic content authoring for the different personas involved in the content creation life-cycle. For bootstrapping the semantic annotation process we integrate five different text annotation services. We evaluate their accuracy and empirically show that a combination of them yields superior results.

41 citations


Cites background from "NERD: A framework for evaluating na..."

  • ...The combination can be performed based on the agreement between two or more of the involved APIs....

    [...]