Open AccessPosted Content
NNE: A Dataset for Nested Named Entity Recognition in English Newswire
Reads0
Chats0
TLDR
This work describes NNE—a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank, which comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting.Abstract:
Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE---a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting. We hope the public release of this large dataset for English newswire will encourage development of new techniques for nested NER.read more
Citations
More filters
Proceedings ArticleDOI
Parallel Instance Query Network for Named Entity Recognition
Yongliang Shen,Xiaobin Wang,Zeqi Tan,Guangwei Xu,Pengjun Xie,Fei Huang,Weiming Lu,Yueting Zhuang +7 more
TL;DR: Parallel Instance Query Network (PIQN) is proposed, which sets up global and learnable instance queries to extract entities from a sentence in a parallel manner and outperforms previous state-of-the-art models.
Proceedings ArticleDOI
Nested Named Entity Recognition as Latent Lexicalized Constituency Parsing
Chao Lou,Songlin Yang,Kewei Tu +2 more
TL;DR: This work resorts to more expressive structures, lexicalized constituency trees in which constituents are annotated by headwords, to model nested entities, and leverages the Eisner-Satta algorithm to perform partial marginalization and inference efficiently.
Proceedings ArticleDOI
Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT
TL;DR: Wojood is a corpus for Arabic nested Named Entity Recognition (NER) that consists of about 550K Modern Standard Arabic and dialect tokens that are manually annotated with 21 entity types including person, organization, location, event and date.
Proceedings ArticleDOI
Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages
Arnav Mhaske,Harsh Kedia,Sumanth Doddapaneni,Mitesh M. Khapra,Narendra Kumar,Rudramurthy,Anoop Kunchukuttan +6 more
TL;DR: The Naamapadam dataset as discussed by the authors contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location, and, Organization) for 9 out of the 11 major Indian languages from two language families.
References
More filters
ReportDOI
Building a large annotated corpus of English: the penn treebank
TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Proceedings ArticleDOI
Neural Architectures for Named Entity Recognition
TL;DR: Comunicacio presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 of juny 2016.
Proceedings ArticleDOI
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition
TL;DR: The CoNLL-2003 shared task on NER as mentioned in this paper was the first NER task with language-independent named entity recognition (NER) data sets and evaluation method, and a general overview of the systems that participated in the task and their performance.
Journal Article
Assessing agreement on classification tasks: the kappa statistic
TL;DR: The authors discuss what is wrong with reliability measures as they are currently used for discourse and dialogue work in computational linguistics and cognitive science, and argue that we would be better off as a field adopting techniques from content analysis.
Journal ArticleDOI
GENIA corpus—a semantically annotated corpus for bio-textmining
TL;DR: The GENIA corpus as mentioned in this paper is a large corpus of 2000 MEDLINE abstracts with more than 400 000 words and almost 100, 000 annotations for biological terms for bio-text mining.