scispace - formally typeset
Journal ArticleDOI

Geographic Named Entity Recognition and Disambiguation in Mexican News using word embeddings

Reads0
Chats0
TLDR
This study shows that relationships between geographic and semantic spaces arise when the authors apply word embedding models over a corpus of documents in Mexican Spanish, and achieves high accuracy for geographic named entity recognition in Spanish.
Abstract
In recent years, dense word embeddings for text representation have been widely used since they can model complex semantic and morphological characteristics of language, such as meaning in specific contexts and applications. Contrary to sparse representations, such as one-hot encoding or frequencies, word embeddings provide computational advantages and improvements on the results in many natural language processing tasks, similar to the automatic extraction of geospatial information. Computer systems capable of discovering geographic information from natural language involve a complex process called geoparsing. In this work, we explore the use of word embeddings for two NLP tasks: Geographic Named Entity Recognition and Geographic Entity Disambiguation, both as an effort to develop the first Mexican Geoparser. Our study shows that relationships between geographic and semantic spaces arise when we apply word embedding models over a corpus of documents in Mexican Spanish. Our models achieved high accuracy for geographic named entity recognition in Spanish.

read more

Citations
More filters
Journal ArticleDOI

Bridge inspection named entity recognition via BERT and lexicon augmented machine reading comprehension neural model

TL;DR: A novel lexicon augmented machine reading comprehension-based NER neural model for identifying flat and nested entities from Chinese bridge inspection text and results show that the proposed model outperforms other mainstream NER models on the bridge inspection corpus.
Journal ArticleDOI

Legal Text Recognition Using LSTM-CRF Deep Learning Model

TL;DR: The parameter learning result using log-likelihood is better than that using the maximum interval criterion, and it is ideal for the Bi-LSTM-CRF model, which is more suitable for recognizing extended entities.
Journal ArticleDOI

Chinese Named Entity Recognition in the Geoscience Domain Based on BERT

TL;DR: In this article , a deep learning-based geological named entity recognition model was proposed to obtain character vectors rich in semantic information through the BERT pretrained language model to alleviate the lack of specificity of static word vectors (e.g., word2vec) and to improve the extraction capability of complex geological entities.
Peer ReviewDOI

Chinese Named Entity Recognition in the Geoscience Domain Based on BERT

TL;DR: An integrated deep learning model incorporating BERT, BiGRU and CRF is constructed to obtain character vectors rich in semantic information through the BERT pretrained language model to alleviate for the lack of specificity of static word vectors and to improve the extraction capability of complex geological entities.
Journal ArticleDOI

ACE-ADP: Adversarial Contextual Embeddings Based Named Entity Recognition for Agricultural Diseases and Pests

TL;DR: An adversarial contextual embeddings-based model named ACE-ADP is proposed for named entity recognition in Chinese agricultural diseases and pests domain and demonstrated that it could not only effectively extract rare entities but also maintain a powerful ability to predict new entities in new datasets with high accuracy.
References
More filters
Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Posted Content

Efficient Estimation of Word Representations in Vector Space

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Journal ArticleDOI

Enriching Word Vectors with Subword Information

TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.
Proceedings ArticleDOI

Deep contextualized word representations

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).
Related Papers (5)