Learning multilingual named entity recognition from Wikipedia
TLDR
The approach outperforms other approaches to automatic ne annotation; competes with gold-standard training when tested on an evaluation corpus from a different source; and performs 10% better than newswire-trained models on manually-annotated Wikipedia text.About:
This article is published in Artificial Intelligence.The article was published on 2013-01-01 and is currently open access. It has received 338 citations till now. The article focuses on the topics: Named-entity recognition & Anchor text.read more
Citations
More filters
Proceedings ArticleDOI
Neural Architectures for Named Entity Recognition
TL;DR: Comunicacio presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 of juny 2016.
Proceedings ArticleDOI
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
TL;DR: This work introduces Stanza, an open-source Python natural language processing toolkit supporting 66 human languages that features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition.
Journal ArticleDOI
Knowledge graph refinement: A survey of approaches and evaluation methods
TL;DR: A survey of such knowledge graph refinement approaches, with a dual look at both the methods being proposed as well as the evaluation methodologies used.
Proceedings ArticleDOI
FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP
TL;DR: The core idea of the FLAIR framework is to present a simple, unified interface for conceptually very different types of word and document embeddings, which effectively hides all embedding-specific engineering complexity and allows researchers to “mix and match” variousembeddings with little effort.
Posted Content
DyNet: The Dynamic Neural Network Toolkit
Graham Neubig,Chris Dyer,Yoav Goldberg,Austin Matthews,Waleed Ammar,Antonios Anastasopoulos,Miguel Ballesteros,David Chiang,Daniel Clothiaux,Trevor Cohn,Kevin Duh,Manaal Faruqui,Cynthia Gan,Dan Garrette,Yangfeng Ji,Lingpeng Kong,Adhiguna Kuncoro,Gaurav Kumar,Chaitanya Malaviya,Paul Michel,Yusuke Oda,Matthew Richardson,Naomi Saphra,Swabha Swayamdipta,Pengcheng Yin +24 more
TL;DR: DyNet is a toolkit for implementing neural network models based on dynamic declaration of network structure that has an optimized C++ backend and lightweight graph representation and is designed to allow users to implement their models in a way that is idiomatic in their preferred programming language.
References
More filters
ReportDOI
Building a large annotated corpus of English: the penn treebank
TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Journal Article
LIBLINEAR: A Library for Large Linear Classification
TL;DR: LIBLINEAR is an open source library for large-scale linear classification that supports logistic regression and linear support vector machines and provides easy-to-use command-line tools and library calls for users and developers.
Proceedings ArticleDOI
Yago: a core of semantic knowledge
TL;DR: YAGO as discussed by the authors is a light-weight and extensible ontology with high coverage and quality, which includes the Is-A hierarchy as well as non-taxonomic relations between entities (such as HASONEPRIZE).
Proceedings ArticleDOI
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition
TL;DR: The CoNLL-2003 shared task on NER as mentioned in this paper was the first NER task with language-independent named entity recognition (NER) data sets and evaluation method, and a general overview of the systems that participated in the task and their performance.
Posted Content
NLTK: The Natural Language Toolkit
Edward Loper,Steven Bird +1 more
TL;DR: NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware that covers symbolic and statistical natural language processing.