scispace - formally typeset
Open AccessPosted Content

NNE: A Dataset for Nested Named Entity Recognition in English Newswire

Reads0
Chats0
TLDR
This work describes NNE—a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank, which comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting.
Abstract
Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE---a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting. We hope the public release of this large dataset for English newswire will encourage development of new techniques for nested NER.

read more

Citations
More filters
Proceedings ArticleDOI

Parallel Instance Query Network for Named Entity Recognition

TL;DR: Parallel Instance Query Network (PIQN) is proposed, which sets up global and learnable instance queries to extract entities from a sentence in a parallel manner and outperforms previous state-of-the-art models.
Proceedings ArticleDOI

Nested Named Entity Recognition as Latent Lexicalized Constituency Parsing

TL;DR: This work resorts to more expressive structures, lexicalized constituency trees in which constituents are annotated by headwords, to model nested entities, and leverages the Eisner-Satta algorithm to perform partial marginalization and inference efficiently.
Proceedings ArticleDOI

Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT

TL;DR: Wojood is a corpus for Arabic nested Named Entity Recognition (NER) that consists of about 550K Modern Standard Arabic and dialect tokens that are manually annotated with 21 entity types including person, organization, location, event and date.
Proceedings ArticleDOI

Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages

TL;DR: The Naamapadam dataset as discussed by the authors contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location, and, Organization) for 9 out of the 11 major Indian languages from two language families.
References
More filters
ReportDOI

Building a large annotated corpus of English: the penn treebank

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Proceedings ArticleDOI

Neural Architectures for Named Entity Recognition

TL;DR: Comunicacio presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 of juny 2016.
Proceedings ArticleDOI

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

TL;DR: The CoNLL-2003 shared task on NER as mentioned in this paper was the first NER task with language-independent named entity recognition (NER) data sets and evaluation method, and a general overview of the systems that participated in the task and their performance.
Journal Article

Assessing agreement on classification tasks: the kappa statistic

TL;DR: The authors discuss what is wrong with reliability measures as they are currently used for discourse and dialogue work in computational linguistics and cognitive science, and argue that we would be better off as a field adopting techniques from content analysis.
Journal ArticleDOI

GENIA corpus—a semantically annotated corpus for bio-textmining

TL;DR: The GENIA corpus as mentioned in this paper is a large corpus of 2000 MEDLINE abstracts with more than 400 000 words and almost 100, 000 annotations for biological terms for bio-text mining.
Related Papers (5)