GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Citations
29,480 citations
Cites background or methods from "GLUE: A Multi-Task Benchmark and An..."
..., 2018) achieved previously state-of-the-art results on many sentencelevel tasks from the GLUE benchmark (Wang et al., 2018)....
[...]
...QNLI Question Natural Language Inference is a version of the Stanford Question Answering Dataset (Rajpurkar et al., 2016) which has been converted to a binary classification task (Wang et al., 2018)....
[...]
...The GLUE benchmark includes the following datasets, the descriptions of which were originally summarized in Wang et al. (2018): MNLI Multi-Genre Natural Language Inference is a large-scale, crowdsourced entailment classification task (Williams et al., 2018)....
[...]
..., 2016) which has been converted to a binary classification task (Wang et al., 2018)....
[...]
...At least partly due this advantage, OpenAI GPT (Radford et al., 2018) achieved previously state-of-the-art results on many sentencelevel tasks from the GLUE benchmark (Wang et al., 2018)....
[...]
13,994 citations
Cites background or methods from "GLUE: A Multi-Task Benchmark and An..."
...GLUE The General Language Understanding Evaluation (GLUE) benchmark (Wang et al., 2019b) is a collection of 9 datasets for evaluating natural language understanding systems....
[...]
...Instead we use the reformatted WNLI data from SuperGLUE (Wang et al., 2019a), which indicates the span of the query pronoun and referent....
[...]
...GLUE The General Language Understanding Evaluation (GLUE) benchmark (Wang et al., 2019b) is a collection of 9 datasets for evaluating natural language understanding systems.6 Tasks are framed as either single-sentence classification or sentence-pair classification tasks....
[...]
4,798 citations
4,505 citations
Cites background or methods from "GLUE: A Multi-Task Benchmark and An..."
...It matches the performance of RoBERTa (Liu et al., 2019) with comparable training resources on GLUE (Wang et al., 2018) and SQuAD (Rajpurkar et al., 2016), and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks....
[...]
...Tables 3 and 2 compares the performance of BART with several recent approaches on the well-studied SQuAD and GLUE tasks (Warstadt et al., 2018; Socher et al., 2013; Dolan & Brockett, 2005; Agirre et al., 2007; Williams et al., 2017; Dagan et al., 2006; Levesque et al., 2011)....
[...]
..., 2019) with comparable training resources on GLUE (Wang et al., 2018) and SQuAD (Rajpurkar et al....
[...]
3,877 citations
Cites background or result from "GLUE: A Multi-Task Benchmark and An..."
...General Language Understanding We assess the language understanding and generalization capabilities of DistilBERT on the General Language Understanding Evaluation (GLUE) benchmark [Wang et al., 2018], a collection of 9 datasets for evaluating natural language understanding systems....
[...]
...General Language Understanding We assess the language understanding and generalization capabilities of DistilBERT on the General Language Understanding Evaluation (GLUE) benchmark [Wang et al., 2018], a collection of 9 datasets for evaluating natural language understanding systems. We report scores on the development sets for each task by fine-tuning DistilBERT without the use of ensembling or multi-tasking scheme for fine-tuning (which are mostly orthogonal to the present work). We compare the results to the baseline provided by the authors of GLUE: an ELMo (Peters et al. [2018]) encoder followed by two BiLSTMs....
[...]