NNE: A Dataset for Nested Named Entity Recognition in English Newswire

Open AccessPosted Content

NNE: A Dataset for Nested Named Entity Recognition in English Newswire

Nicky Ringland, +5 more

- 04 Jun 2019 -

arXiv: Computation and Language

Chats0

TLDR

This work describes NNE—a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank, which comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting.

Abstract:

Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE---a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting. We hope the public release of this large dataset for English newswire will encourage development of new techniques for nested NER.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Parallel Instance Query Network for Named Entity Recognition

Yongliang Shen, +7 more

TL;DR: Parallel Instance Query Network (PIQN) is proposed, which sets up global and learnable instance queries to extract entities from a sentence in a parallel manner and outperforms previous state-of-the-art models.

...read moreread less

Proceedings ArticleDOI

Nested Named Entity Recognition as Latent Lexicalized Constituency Parsing

Chao Lou, +2 more

TL;DR: This work resorts to more expressive structures, lexicalized constituency trees in which constituents are annotated by headwords, to model nested entities, and leverages the Eisner-Satta algorithm to perform partial marginalization and inference efficiently.

...read moreread less

Proceedings ArticleDOI

Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT

Mustafa Jarrar, +2 more

TL;DR: Wojood is a corpus for Arabic nested Named Entity Recognition (NER) that consists of about 550K Modern Standard Arabic and dialect tokens that are manually annotated with 21 entity types including person, organization, location, event and date.

...read moreread less

Proceedings ArticleDOI

Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages

Arnav Mhaske, +6 more

TL;DR: The Naamapadam dataset as discussed by the authors contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location, and, Organization) for 9 out of the 11 major Indian languages from two language families.

...read moreread less

Journal ArticleDOI

A review: development of named entity recognition (NER) technology for aeronautical information intelligence

Mi Baigang, +1 more

- 24 May 2022 -

Artificial Intelligence Review

References

PDF

Open Access

More filters

ReportDOI

Building a large annotated corpus of English: the penn treebank

Mitchell Marcus, +2 more

- 01 Jun 1993 -

Computational Linguistics

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.

...read moreread less

Proceedings ArticleDOI

Neural Architectures for Named Entity Recognition

Guillaume Lample, +4 more

TL;DR: Comunicacio presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 of juny 2016.

...read moreread less

Proceedings ArticleDOI

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

Erik Tjong Kim Sang, +1 more

TL;DR: The CoNLL-2003 shared task on NER as mentioned in this paper was the first NER task with language-independent named entity recognition (NER) data sets and evaluation method, and a general overview of the systems that participated in the task and their performance.

...read moreread less

Journal Article

Assessing agreement on classification tasks: the kappa statistic

Jean Carletta

- 01 Jun 1996 -

Computational Linguistics

TL;DR: The authors discuss what is wrong with reliability measures as they are currently used for discourse and dialogue work in computational linguistics and cognitive science, and argue that we would be better off as a field adopting techniques from content analysis.

...read moreread less

Journal ArticleDOI

GENIA corpus—a semantically annotated corpus for bio-textmining

Jin-Dong Kim, +3 more

- 03 Jul 2003 -

Bioinformatics

TL;DR: The GENIA corpus as mentioned in this paper is a large corpus of 2000 MEDLINE abstracts with more than 400 000 words and almost 100, 000 annotations for biological terms for bio-text mining.

...read moreread less

arXiv: Computation and Language

Mining Wiki Resources for Multilingual Named Entity Recognition

Alexander E. Richman, +1 more

NNE: A Dataset for Nested Named Entity Recognition in English Newswire

Citations

Parallel Instance Query Network for Named Entity Recognition

Nested Named Entity Recognition as Latent Lexicalized Constituency Parsing

Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT

Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages

A review: development of named entity recognition (NER) technology for aeronautical information intelligence

References

Building a large annotated corpus of English: the penn treebank

Neural Architectures for Named Entity Recognition

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

Assessing agreement on classification tasks: the kappa statistic

GENIA corpus—a semantically annotated corpus for bio-textmining

Related Papers (5)

NNE: A Dataset for Nested Named Entity Recognition in English Newswire

Re-ranking for joint named-entity recognition and linking

Generating Chinese Named Entity Data from a Parallel Corpus

CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese.

Mining Wiki Resources for Multilingual Named Entity Recognition