Easy Victories and Uphill Battles in Coreference Resolution

Home
/
Papers
/
Easy Victories and Uphill Battles in Coreference Resolution

Proceedings Article•

Easy Victories and Uphill Battles in Coreference Resolution

Greg Durrett¹, Dan Klein¹•Institutions (1)

01 Oct 2013-pp 1971-1982

TL;DR: This work presents a state-of-the-art coreference system that captures various syntactic, discourse, and semantic phenomena implicitly, with a small number of homogeneous feature templates examining shallow properties of mentions, allowing it to win “easy victories” without crafted heuristics.

read less

Abstract: Classical coreference systems encode various syntactic, discourse, and semantic phenomena explicitly, using heterogenous features computed from hand-crafted heuristics. In contrast, we present a state-of-the-art coreference system that captures such phenomena implicitly, with a small number of homogeneous feature templates examining shallow properties of mentions. Surprisingly, our features are actually more effective than the corresponding hand-engineered ones at modeling these key linguistic phenomena, allowing us to win “easy victories” without crafted heuristics. These features are successful on syntax and discourse; however, they do not model semantic compatibility well, nor do we see gains from experiments with shallow semantic features from the literature, suggesting that this approach to semantics is an “uphill battle.” Nonetheless, our final system 1 outperforms the Stanford system (Lee et al. (2011), the winner of the CoNLL 2011 shared task) by 3.5% absolute on the CoNLL metric and outperforms the IMS system (Bj¨ orkelund and Farkas (2012), the best publicly available English coreference system) by 1.9% absolute.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep contextualized word representations

[...]

Matthew E. Peters¹, Mark Neumann¹, Mohit Iyyer², Matt Gardner¹, Christopher Clark¹, Kenton Lee³, Luke Zettlemoyer⁴ - Show less +3 more•Institutions (4)

Allen Institute for Artificial Intelligence¹, University of Massachusetts Amherst², Google³, University of Washington⁴

15 Feb 2018

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).

...read moreread less

Abstract: We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.

...read moreread less

7,412 citations

Posted Content•

Deep contextualized word representations

[...]

Matthew E. Peters¹, Mark Neumann¹, Mohit Iyyer², Matt Gardner¹, Christopher Clark¹, Kenton Lee³, Luke Zettlemoyer⁴ - Show less +3 more•Institutions (4)

Allen Institute for Artificial Intelligence¹, University of Massachusetts Amherst², Google³, University of Washington⁴

15 Feb 2018-arXiv: Computation and Language

TL;DR: This article introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).

...read moreread less

1,696 citations

Proceedings Article•DOI•

End-to-end Neural Coreference Resolution

[...]

Kenton Lee¹, Luheng He², Michael Lewis³, Luke Zettlemoyer¹•Institutions (3)

University of Washington¹, Google², University of Pittsburgh³

21 Jul 2017

TL;DR: This work introduces the first end-to-end coreference resolution model, trained to maximize the marginal likelihood of gold antecedent spans from coreference clusters and is factored to enable aggressive pruning of potential mentions.

...read moreread less

Abstract: We introduce the first end-to-end coreference resolution model and show that it significantly outperforms all previous work without using a syntactic parser or hand-engineered mention detector. The key idea is to directly consider all spans in a document as potential mentions and learn distributions over possible antecedents for each. The model computes span embeddings that combine context-dependent boundary representations with a head-finding attention mechanism. It is trained to maximize the marginal likelihood of gold antecedent spans from coreference clusters and is factored to enable aggressive pruning of potential mentions. Experiments demonstrate state-of-the-art performance, with a gain of 1.5 F1 on the OntoNotes benchmark and by 3.1 F1 using a 5-model ensemble, despite the fact that this is the first approach to be successfully trained with no external resources.

...read moreread less

705 citations

Cites background or methods from "Easy Victories and Uphill Battles i..."

...The attention component is inspired by parser-derived head-word matching features from previous systems (Durrett and Klein, 2013), but is less susceptible to cascading errors....
[...]
..., 2012; Björkelund and Kuhn, 2014; Martschat and Strube, 2015), or (4) mention-ranking models (Durrett and Klein, 2013; Wiseman et al., 2015; Clark and Manning, 2016a)....
[...]

Proceedings Article•DOI•

Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

[...]

Jieyu Zhao¹, Tianlu Wang², Mark Yatskar³, Vicente Ordonez², Kai-Wei Chang¹ - Show less +1 more•Institutions (3)

University of California, Los Angeles¹, University of Virginia², University of Washington³

01 Jun 2018

TL;DR: A data-augmentation approach is demonstrated that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by rule-based, feature-rich, and neural coreference systems in WinoBias without significantly affecting their performance on existing datasets.

...read moreread less

Abstract: In this paper, we introduce a new benchmark for co-reference resolution focused on gender bias, WinoBias. Our corpus contains Winograd-schema style sentences with entities corresponding to people referred by their occupation (e.g. the nurse, the doctor, the carpenter). We demonstrate that a rule-based, a feature-rich, and a neural coreference system all link gendered pronouns to pro-stereotypical entities with higher accuracy than anti-stereotypical entities, by an average difference of 21.1 in F1 score. Finally, we demonstrate a data-augmentation approach that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by these systems in WinoBias without significantly affecting their performance on existing datasets.

...read moreread less

562 citations

Cites background or methods from "Easy Victories and Uphill Battles i..."

..., 2010), feature-based (Durrett and Klein, 2013; Peng et al., 2015a), and neuralnetwork based (Clark and Manning, 2016; Lee et al....
[...]
...In this section we evaluate of three representative systems: rule based, Rule, (Raghunathan et al., 2010), feature-rich, Feature, (Durrett and Klein, 2013), and end-to-end neural (the current state-ofthe-art), E2E, (Lee et al., 2017)....
[...]
...15 cal examples: the Stanford Deterministic Coreference System (Raghunathan et al., 2010), the Berkeley Coreference Resolution System (Durrett and Klein, 2013) and the current best published system: the UW End-to-end Neural Coreference Resolution System (Lee et al., 2017)....
[...]
..., 2010), feature-rich, Feature, (Durrett and Klein, 2013), and end-to-end neural (the current state-ofthe-art), E2E, (Lee et al....
[...]
..., 2010), the Berkeley Coreference Resolution System (Durrett and Klein, 2013) and the current best published system: the UW End-to-end Neural Coreference Resolution System (Lee et al....
[...]

Posted Content•

WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale

[...]

Keisuke Sakaguchi¹, Ronan Le Bras¹, Chandra Bhagavatula¹, Yejin Choi¹•Institutions (1)

Allen Institute for Artificial Intelligence¹

24 Jul 2019-arXiv: Computation and Language

TL;DR: The authors introduced WinoGrande, a large-scale dataset of 44k problems, inspired by the original Winograd Schema Challenge (WSC) design, but adjusted to improve both the scale and the hardness of the dataset.

...read moreread less

Abstract: The Winograd Schema Challenge (WSC) (Levesque, Davis, and Morgenstern 2011), a benchmark for commonsense reasoning, is a set of 273 expert-crafted pronoun resolution problems originally designed to be unsolvable for statistical models that rely on selectional preferences or word associations. However, recent advances in neural language models have already reached around 90% accuracy on variants of WSC. This raises an important question whether these models have truly acquired robust commonsense capabilities or whether they rely on spurious biases in the datasets that lead to an overestimation of the true capabilities of machine commonsense. To investigate this question, we introduce WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the scale and the hardness of the dataset. The key steps of the dataset construction consist of (1) a carefully designed crowdsourcing procedure, followed by (2) systematic bias reduction using a novel AfLite algorithm that generalizes human-detectable word associations to machine-detectable embedding associations. The best state-of-the-art methods on WinoGrande achieve 59.4-79.1%, which are 15-35% below human performance of 94.0%, depending on the amount of the training data allowed. Furthermore, we establish new state-of-the-art results on five related benchmarks - WSC (90.1%), DPR (93.1%), COPA (90.6%), KnowRef (85.6%), and Winogender (97.1%). These results have dual implications: on one hand, they demonstrate the effectiveness of WinoGrande when used as a resource for transfer learning. On the other hand, they raise a concern that we are likely to be overestimating the true capabilities of machine commonsense across all these benchmarks. We emphasize the importance of algorithmic bias reduction in existing and future benchmarks to mitigate such overestimation.

...read moreread less

366 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

[...]

John C. Duchi¹, Elad Hazan², Yoram Singer³•Institutions (3)

University of California, Berkeley¹, IBM², Google³

01 Jan 2010

TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.

...read moreread less

Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

...read moreread less

7,244 citations

Journal Article•

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

[...]

John C. Duchi¹, Elad Hazan², Yoram Singer³•Institutions (3)

University of California, Berkeley¹, Princeton University², Google³

01 Feb 2011-Journal of Machine Learning Research

TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

...read moreread less

6,984 citations

"Easy Victories and Uphill Battles i..." refers methods in this paper

...001 and optimize the objective using AdaGrad (Duchi et al., 2011)....
[...]
...We set (αFA, αFN, αWL) = (0.1, 3.0, 1.0) and λ = 0.001 and optimize the objective using AdaGrad (Duchi et al., 2011)....
[...]

Report•DOI•

Centering: a framework for modeling the local coherence of discourse

[...]

Barbara J. Grosz¹, Scott Weinstein², Aravind K. Joshi²•Institutions (2)

Harvard University¹, University of Pennsylvania²

01 Jun 1995-Computational Linguistics

TL;DR: Interactions between local coherence and choice of referring expressions are examined; it is argued that differences in coherence correspond in part to the inference demands made by different types of referring expression, given a particular attentional state.

...read moreread less

Abstract: This paper concerns relationships among focus of attention, choice of referring expression, and perceived coherence of utterances within a discourse segment. It presents a framework and initial theory of centering intended to model the local component of attentional state. The paper examines interactions between local coherence and choice of referring expressions; it argues that differences in coherence correspond in part to the inference demands made by different types of referring expressions, given a particular attentional state. It demonstrates that the attentional state properties modeled by centering can account for these differences.

...read moreread less

1,994 citations

"Easy Victories and Uphill Battles i..." refers background in this paper

...And finally, rather than targeting centering theory (Grosz et al., 1995) with rule-based features identifying syntactic positions (Stoyanov et al., 2010; Haghighi and Klein, 2010), our features on word context can identify configurational clues like whether a mention is preceded or followed by a…...
[...]

Journal Article•DOI•

A machine learning approach to coreference resolution of noun phrases

[...]

Wee Meng Soon¹, Hwee Tou Ng¹, Daniel Chung Yong Lim¹•Institutions (1)

DSO National Laboratories¹

01 Dec 2001-Computational Linguistics

TL;DR: The learning approach to coreference resolution of noun phrases in unrestricted text is presented, indicating that on the general noun phrase coreference task, the learning approach holds promise and achieves accuracy comparable to that of nonlearning approaches.

...read moreread less

Abstract: In this paper, we present a learning approach to coreference resolution of noun phrases in unrestricted text. The approach learns from a small, annotated corpus and the task includes resolving not just a certain type of noun phrase (e.g., pronouns) but rather general noun phrases. It also does not restrict the entity types of the noun phrases; that is, coreference is assigned whether they are of "organization," "person," or other types. We evaluate our approach on common data sets (namely, the MUC-6 and MUC-7 coreference corpora) and obtain encouraging results, in-dicating that on the general noun phrase coreference task, the learning approach holds promise and achieves accuracy comparable to that of nonlearning approaches. Our system is the first learning-based system that offers performance comparable to that of state-of-the-art nonlearning systems on these data sets.

...read moreread less

1,059 citations

"Easy Victories and Uphill Battles i..." refers background or methods in this paper

...However, the semantic information contained even in a coreference corpus of thousands of documents is insufficient to generalize to unseen data,8 so system designers have turned to external resources such as semantic classes derived from WordNet (Soon et al., 2001), WordNet hypernymy or synonymy (Stoyanov et al., 2010), semantic similarity computed from online resources (Ponzetto and Strube, 2006), named entity type features, gender and number match using the dataset of Bergsma and Lin (2006), and features from unsupervised clusters (Hendrickx and Daelemans, 2007; Durrett et al., 2013)....
[...]
...…is insufficient to generalize to unseen data,8 so system designers have turned to external resources such as semantic classes derived from WordNet (Soon et al., 2001), WordNet hypernymy or synonymy (Stoyanov et al., 2010), semantic similarity computed from online resources (Ponzetto and Strube,…...
[...]
...In this section, we consider the following subset of these information sources: • WordNet hypernymy and synonymy • Number and gender data for nominals and propers from Bergsma and Lin (2006) • Named entity types • Latent clusters computed from English Gigaword (Graff et al., 2007), where a latent cluster label generates each nominal head (excluding pronouns) and a conjunction of its verbal governor and semantic role, if any (Durrett et al., 2013)....
[...]
...Unlike binary classification-based coreference systems where independent binary decisions are made about each pair (Soon et al., 2001; Bengtson and Roth, 2008; Versley et al., 2008; Stoyanov et al., 2010), we use a log-linear model to select at most one antecedent for...
[...]
...However, the semantic information contained even in a coreference corpus of thousands of documents is insufficient to generalize to unseen data,8 so system designers have turned to external resources such as semantic classes derived from WordNet (Soon et al., 2001), WordNet hypernymy or synonymy (Stoyanov et al....
[...]

Proceedings Article•DOI•

OntoNotes: The 90% Solution

[...]

Eduard Hovy, Mitchell Marcus¹, Martha Palmer, Lance Ramshaw², Ralph Weischedel² - Show less +1 more•Institutions (2)

University of Pennsylvania¹, BBN Technologies²

04 Jun 2006

TL;DR: It is described the OntoNotes methodology and its result, a large multilingual richly-annotated corpus constructed at 90% interannotator agreement, which will be made available to the community during 2007.

...read moreread less

Abstract: We describe the OntoNotes methodology and its result, a large multilingual richly-annotated corpus constructed at 90% interannotator agreement. An initial portion (300K words of English newswire and 250K words of Chinese newswire) will be made available to the community during 2007.

...read moreread less

931 citations

"Easy Victories and Uphill Battles i..." refers methods in this paper

...Throughout this work, we use the datasets from the CoNLL 2011 shared task2 (Pradhan et al., 2011), which is derived from the OntoNotes corpus (Hovy et al., 2006)....
[...]
...which is derived from the OntoNotes corpus (Hovy et al., 2006)....
[...]