Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition
Leon Derczynski,Eric Nichols,Marieke van Erp,Nut Limsopatham +3 more
- pp 140-147
TLDR
The goal of this task is to provide a definition of emerging and of rare entities, and based on that, also datasets for detecting these entities and to evaluate the ability of participating entries to detect and classify novel and emerging named entities in noisy text.Citations
More filters
Proceedings ArticleDOI
BERTweet: A pre-trained language model for English Tweets
TL;DR: BERweet as discussed by the authors is the first large-scale pre-trained language model for English Tweets, having the same architecture as BERT-base and is trained using the RoBERTa pre-training procedure.
Proceedings ArticleDOI
FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP
TL;DR: The core idea of the FLAIR framework is to present a simple, unified interface for conceptually very different types of word and document embeddings, which effectively hides all embedding-specific engineering complexity and allows researchers to “mix and match” variousembeddings with little effort.
Journal ArticleDOI
A Survey on Deep Learning for Named Entity Recognition
TL;DR: A comprehensive review on existing deep learning techniques for NER is provided in this paper, where the authors systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder.
Posted Content
The Pushshift Reddit Dataset
TL;DR: The Pushshift Reddit dataset makes it possible for social media researchers to reduce time spent in the data collection, cleaning, and storage phases of their projects.
References
More filters
Proceedings Article
Efficient Named Entity Annotation through Pre-empting
Leon Derczynski,Kalina Bontcheva +1 more
TL;DR: A technique for reducing the amount of entity-less text examined by annotators, which is called "preempting", is demonstrated and evaluated in a crowdsourcing scenario, where it provides downstream performance improvements for the same size corpus.