Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging

doi:10.18653/V1/D17-1035

Open AccessProceedings ArticleDOI

Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging

- pp 338-348

TLDR

It is shown that reporting a single performance score is insufficient to compare non-deterministic approaches and proposed to compare score distributions based on multiple executions, and network architectures are presented that produce both superior performance as well as are more stable with respect to the remaining hyperparameters.

Abstract:

In this paper we show that reporting a single performance score is insufficient to compare non-deterministic approaches. We demonstrate for common sequence tagging tasks that the seed value for the random number generator can result in statistically significant (p < 10-4) differences for state-of-the-art systems. For two recent systems for NER, we observe an absolute difference of one percentage point F₁-score depending on the selected seed value, making these systems perceived either as state-of-the-art or mediocre. Instead of publishing and reporting single performance scores, we propose to compare score distributions based on multiple executions. Based on the evaluation of 50.000 LSTM-networks for five sequence tagging tasks, we present network architectures that produce both superior performance as well as are more stable with respect to the remaining hyperparameters.

Citations

PDF

Open Access

More filters

Proceedings Article

Contextual String Embeddings for Sequence Labeling

Alan Akbik, +2 more

TL;DR: This paper proposes to leverage the internal states of a trained character language model to produce a novel type of word embedding which they refer to as contextual string embeddings, which are fundamentally model words as sequences of characters and are contextualized by their surrounding text.

...read moreread less

Proceedings ArticleDOI

Semi-Supervised Sequence Modeling with Cross-View Training

Kevin Clark, +3 more

TL;DR: Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data, is proposed and evaluated, achieving state-of-the-art results.

...read moreread less

Posted Content

Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

Nils Reimers, +1 more

- 21 Apr 2020 -

arXiv: Computation and Language

TL;DR: An easy and efficient method to extend existing sentence embedding models to new languages by using the original (monolingual) model to generate sentence embeddings for the source language and then training a new system on translated sentences to mimic the original model.

...read moreread less

Proceedings ArticleDOI

Pooled Contextualized Embeddings for Named Entity Recognition.

Alan Akbik, +2 more

TL;DR: This work proposes a method in which it dynamically aggregate contextualized embeddings of each unique string that the authors encounter and uses a pooling operation to distill a ”global” word representation from all contextualized instances.

...read moreread less

Proceedings ArticleDOI

Double Embeddings and CNN-based Sequence Labeling for Aspect Extraction

Hu Xu, +3 more

TL;DR: The authors proposed a novel and yet simple CNN model employing two types of pre-trained embeddings for aspect extraction: general-purpose embedding and domain-specific embedding, which achieves surprisingly good results, outperforming state-of-theart sophisticated existing methods.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014 -

Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Journal ArticleDOI

Backpropagation applied to handwritten zip code recognition

Yann LeCun, +6 more

- 01 Dec 1989 -

Neural Computation

TL;DR: This paper demonstrates how constraints from the task domain can be integrated into a backpropagation network through the architecture of the network, successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service.

...read moreread less

Neural Computation

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +2 more

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging

Citations

Contextual String Embeddings for Sequence Labeling

Semi-Supervised Sequence Modeling with Cross-View Training

Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

Pooled Contextualized Embeddings for Named Entity Recognition.

Double Embeddings and CNN-based Sequence Labeling for Aspect Extraction

References

Adam: A Method for Stochastic Optimization

Long short-term memory

Dropout: a simple way to prevent neural networks from overfitting

Glove: Global Vectors for Word Representation

Backpropagation applied to handwritten zip code recognition

Related Papers (5)

Glove: Global Vectors for Word Representation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Long short-term memory

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Distributed Representations of Words and Phrases and their Compositionality