Learning multilingual named entity recognition from Wikipedia

doi:10.1016/J.ARTINT.2012.03.006

Open AccessJournal ArticleDOI

Learning multilingual named entity recognition from Wikipedia

Joel Nothman, +4 more

- 01 Jan 2013 -

Artificial Intelligence

- Vol. 194, pp 151-175

TLDR

The approach outperforms other approaches to automatic ne annotation; competes with gold-standard training when tested on an evaluation corpus from a different source; and performs 10% better than newswire-trained models on manually-annotated Wikipedia text.

About:

This article is published in Artificial Intelligence.The article was published on 2013-01-01 and is currently open access. It has received 338 citations till now. The article focuses on the topics: Named-entity recognition & Anchor text.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Neural Architectures for Named Entity Recognition

Guillaume Lample, +4 more

TL;DR: Comunicacio presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 of juny 2016.

...read moreread less

Proceedings ArticleDOI

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

Peng Qi, +4 more

TL;DR: This work introduces Stanza, an open-source Python natural language processing toolkit supporting 66 human languages that features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition.

...read moreread less

Journal ArticleDOI

Knowledge graph refinement: A survey of approaches and evaluation methods

Heiko Paulheim

- 06 Dec 2016 -

Social Work

TL;DR: A survey of such knowledge graph refinement approaches, with a dual look at both the methods being proposed as well as the evaluation methodologies used.

...read moreread less

Proceedings ArticleDOI

FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP

Alan Akbik, +5 more

TL;DR: The core idea of the FLAIR framework is to present a simple, unified interface for conceptually very different types of word and document embeddings, which effectively hides all embedding-specific engineering complexity and allows researchers to “mix and match” variousembeddings with little effort.

...read moreread less

Collapse

References

PDF

Open Access

More filters

ReportDOI

Building a large annotated corpus of English: the penn treebank

Mitchell Marcus, +2 more

- 01 Jun 1993 -

Computational Linguistics

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.

...read moreread less

Journal Article

LIBLINEAR: A Library for Large Linear Classification

Rong-En Fan, +4 more

- 01 Jun 2008 -

Journal of Machine Learning Research

TL;DR: LIBLINEAR is an open source library for large-scale linear classification that supports logistic regression and linear support vector machines and provides easy-to-use command-line tools and library calls for users and developers.

...read moreread less

Proceedings ArticleDOI

Yago: a core of semantic knowledge

Fabian M. Suchanek, +2 more

TL;DR: YAGO as discussed by the authors is a light-weight and extensible ontology with high coverage and quality, which includes the Is-A hierarchy as well as non-taxonomic relations between entities (such as HASONEPRIZE).

...read moreread less

Proceedings ArticleDOI

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

Erik Tjong Kim Sang, +1 more

TL;DR: The CoNLL-2003 shared task on NER as mentioned in this paper was the first NER task with language-independent named entity recognition (NER) data sets and evaluation method, and a general overview of the systems that participated in the task and their performance.

...read moreread less

Posted Content

NLTK: The Natural Language Toolkit

Edward Loper, +1 more

- 17 May 2002 -

arXiv: Computation and Language

TL;DR: NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware that covers symbolic and statistical natural language processing.

...read moreread less

Collapse

Journal of Machine Learning Research

Learning multilingual named entity recognition from Wikipedia

Citations

Neural Architectures for Named Entity Recognition

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

Knowledge graph refinement: A survey of approaches and evaluation methods

FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP

DyNet: The Dynamic Neural Network Toolkit

References

Building a large annotated corpus of English: the penn treebank

LIBLINEAR: A Library for Large Linear Classification

Yago: a core of semantic knowledge

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

NLTK: The Natural Language Toolkit

Related Papers (5)

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

Neural Architectures for Named Entity Recognition

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling

Natural Language Processing (Almost) from Scratch