Non-lexical neural architecture for fine-grained POS Tagging

doi:10.18653/V1/D15-1025

Open AccessProceedings ArticleDOI

Non-lexical neural architecture for fine-grained POS Tagging

- pp 232-237

TLDR

Experimental results show that the convolutional network can infer meaningful word representations, while for the prediction stage, a well designed and structured strategy allows the model to outperform stateof-the-art results, without any feature engineering.

Abstract:

In this paper we explore a POS tagging application of neural architectures that can infer word representations from the raw character stream. It relies on two modelling stages that are jointly learnt: a convolutional network that infers a word representation directly from the character stream, followed by a prediction stage. Models are evaluated on a POS and morphological tagging task for German. Experimental results show that the convolutional network can infer meaningful word representations, while for the prediction stage, a well designed and structured strategy allows the model to outperform stateof-the-art results, without any feature engineering.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

Xuezhe Ma, +1 more

TL;DR: This paper used a combination of bidirectional LSTM, CNN and CRF for sequence labeling tasks, and achieved state-of-the-art performance on both datasets for POS tagging and CoNLL 2003 corpus for NER.

...read moreread less

Journal ArticleDOI

Named Entity Recognition with Bidirectional LSTM-CNNs

Jason P.C. Chiu, +1 more

- 21 Jul 2016 -

Transactions of the Association for Comp...

TL;DR: In this article, a hybrid bidirectional LSTM and CNN architecture was proposed to automatically detect word and character-level features, eliminating the need for feature engineering and lexicons to achieve high performance.

...read moreread less

Proceedings ArticleDOI

Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss

Barbara Plank, +2 more

TL;DR: The authors compared bi-LSTMs with word, character, and unicode byte embeddings for POS tagging and showed that biLSTM is less sensitive to training data size and label corruptions than previously assumed.

...read moreread less

Journal ArticleDOI

De-identification of patient notes with recurrent neural networks.

Franck Dernoncourt, +3 more

- 01 May 2017 -

Journal of the American Medical Informat...

TL;DR: The first de-identification system based on artificial neural networks (ANNs), which requires no handcrafted features or rules, unlike existing systems, is introduced, which outperforms the state-of-the-art systems.

...read moreread less

Proceedings ArticleDOI

NeuroNER: an easy-to-use program for named-entity recognition based on neural networks

Franck Dernoncourt, +2 more

TL;DR: NeuroNER as mentioned in this paper is an easy-to-use named entity recognition tool based on ANNs, where users can annotate entities using a graphical web-based user interface (BRAT).

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

John C. Duchi, +2 more

TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.

...read moreread less

Journal Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

John C. Duchi, +2 more

- 01 Feb 2011 -

Journal of Machine Learning Research

TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

...read moreread less

Journal ArticleDOI

A neural probabilistic language model

Yoshua Bengio, +3 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.

...read moreread less

Journal ArticleDOI

Error bounds for convolutional codes and an asymptotically optimum decoding algorithm

Andrew J. Viterbi

- 01 Apr 1967 -

IEEE Transactions on Information Theory

TL;DR: The upper bound is obtained for a specific probabilistic nonsequential decoding algorithm which is shown to be asymptotically optimum for rates above R_{0} and whose performance bears certain similarities to that of sequential decoding algorithms.

...read moreread less

Journal Article

Natural Language Processing (Almost) from Scratch

Ronan Collobert, +5 more

- 01 Feb 2011 -

Journal of Machine Learning Research

TL;DR: A unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling is proposed.

...read moreread less

Related Papers (5)

Natural Language Processing (Almost) from Scratch

Ronan Collobert, +5 more

- 01 Feb 2011 -

Journal of Machine Learning Research

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

Non-lexical neural architecture for fine-grained POS Tagging

Citations

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

Named Entity Recognition with Bidirectional LSTM-CNNs

Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss

De-identification of patient notes with recurrent neural networks.

NeuroNER: an easy-to-use program for named-entity recognition based on neural networks

References

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

A neural probabilistic language model

Error bounds for convolutional codes and an asymptotically optimum decoding algorithm

Natural Language Processing (Almost) from Scratch

Related Papers (5)

Natural Language Processing (Almost) from Scratch

Long short-term memory

Glove: Global Vectors for Word Representation

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Distributed Representations of Words and Phrases and their Compositionality