*SEM 2013 shared task: Semantic Textual Similarity

Home
/
Papers
/
*SEM 2013 shared task: Semantic Textual Similarity

Proceedings Article•

*SEM 2013 shared task: Semantic Textual Similarity

Eneko Agirre¹, Daniel Cer², Mona Diab³, Aitor Gonzalez-Agirre¹, Weiwei Guo⁴ - Show less +1 more•Institutions (4)

University of the Basque Country¹, Stanford University², George Washington University³, Columbia University⁴

01 Jun 2013-Vol. 1, pp 32-43

TL;DR: The CORE task attracted 34 participants with 89 runs, and the TYPED task attracted 6 teams with 14 runs, with relative high interannotator correlation, ranging from 62% to 87%.

read less

Abstract: In Semantic Textual Similarity (STS), systems rate the degree of semantic equivalence, on a graded scale from 0 to 5, with 5 being the most similar. This year we set up two tasks: (i) a core task (CORE), and (ii) a typed-similarity task (TYPED). CORE is similar in set up to SemEval STS 2012 task with pairs of sentences from sources related to those of 2012, yet different in genre from the 2012 set, namely, this year we included newswire headlines, machine translation evaluation datasets and multiple lexical resource glossed sets. TYPED, on the other hand, is novel and tries to characterize why two items are deemed similar, using cultural heritage items which are described with metadata such as title, author or description. Several types of similarity have been defined, including similar author, similar time period or similar location. The annotation for both tasks leverages crowdsourcing, with relative high interannotator correlation, ranging from 62% to 87%. The CORE task attracted 34 participants with 89 runs, and the TYPED task attracted 6 teams with 14 runs.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

[...]

Nils Reimers¹, Iryna Gurevych¹•Institutions (1)

Technische Universität Darmstadt¹

14 Aug 2019

TL;DR: Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.

...read moreread less

Abstract: BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT. We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods.

...read moreread less

4,020 citations

Cites methods from "*SEM 2013 shared task: Semantic Tex..."

...We use the STS tasks 2012 - 2016 (Agirre et al., 2012, 2013, 2014, 2015, 2016), the STS benchmark (Cer et al., 2017), and the SICK-Relatedness dataset (Marelli et al., 2014)....
[...]

Proceedings Article•DOI•

SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

[...]

Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, Lucia Specia - Show less +1 more

31 Jul 2017-arXiv: Computation and Language

TL;DR: The STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017), providing insight into the limitations of existing models.

...read moreread less

Abstract: Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art. The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participating in all language tracks. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To support ongoing work on semantic representations, the STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017).

...read moreread less

1,124 citations

Cites methods from "*SEM 2013 shared task: Semantic Tex..."

...To encourage and support research in this area, the STS shared task has been held annually since 2012, providing a venue for evaluation of state-ofthe-art algorithms and models (Agirre et al., 2012, 2013, 2014, 2015, 2016)....
[...]

Proceedings Article•DOI•

SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

[...]

Daniel Cer¹, Mona Diab², Eneko Agirre³, Iñigo Lopez-Gazpio³, Lucia Specia⁴ - Show less +1 more•Institutions (4)

Google¹, George Washington University², University of the Basque Country³, University of Sheffield⁴

01 Jan 2017

TL;DR: The Semantic Textual Similarity (STS) shared task as discussed by the authors was the first task for assessing the state-of-the-art machine translation systems. But the task focused on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE).

...read moreread less

929 citations

Proceedings Article•

Siamese recurrent architectures for learning sentence similarity

[...]

Jonas Mueller¹, Aditya Thyagarajan²•Institutions (2)

Massachusetts Institute of Technology¹, M. S. Ramaiah Institute of Technology²

12 Feb 2016

TL;DR: A siamese adaptation of the Long Short-Term Memory network for labeled data comprised of pairs of variable-length sequences is presented, which compel the sentence representations learned by the model to form a highly structured space whose geometry reflects complex semantic relationships.

...read moreread less

Abstract: We present a siamese adaptation of the Long Short-Term Memory (LSTM) network for labeled data comprised of pairs of variable-length sequences. Our model is applied to assess semantic similarity between sentences, where we exceed state of the art, outperforming carefully handcrafted features and recently proposed neural network systems of greater complexity. For these applications, we provide word-embedding vectors supplemented with synonymic information to the LSTMs, which use a fixed size vector to encode the underlying meaning expressed in a sentence (irrespective of the particular wording/syntax). By restricting subsequent operations to rely on a simple Manhattan metric, we compel the sentence representations learned by our model to form a highly structured space whose geometry reflects complex semantic relationships. Our results are the latest in a line of findings that showcase LSTMs as powerful language models capable of tasks requiring intricate understanding.

...read moreread less

839 citations

Cites methods from "*SEM 2013 shared task: Semantic Tex..."

...Then, our MaLSTM is (pre)trained as previously described on separate sentence-pair data provided for the earlier SemEval 2013 Semantic Textual Similarity task (Agirre and Cer 2013)....
[...]

Proceedings Article•DOI•

SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation

[...]

Eneko Agirre¹, Carmen Banea², Daniel Cer³, Mona Diab⁴, Aitor Gonzalez-Agirre¹, Rada Mihalcea⁵, German Rigau¹, Janyce Wiebe⁶ - Show less +4 more•Institutions (6)

University of the Basque Country¹, University of Michigan², Google³, George Washington University⁴, Carnegie Mellon University⁵, University of Pittsburgh⁶

01 Jan 2016

TL;DR: Comunicacio presentada al 10th International Workshop on Semantic Evaluation (SemEval-2016), celebrat els dies 16 i 17 de juny de 2016 a San Diego, California.

...read moreread less

Abstract: Comunicacio presentada al 10th International Workshop on Semantic Evaluation (SemEval-2016), celebrat els dies 16 i 17 de juny de 2016 a San Diego, California.

...read moreread less

529 citations

Cites methods from "*SEM 2013 shared task: Semantic Tex..."

...The STS shared task has been held annually since 2012, providing a venue for the evaluation of state-of-the-art algorithms and models (Agirre et al., 2012; Agirre et al., 2013; Agirre et al., 2014; Agirre et al., 2015)....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

WordNet : an electronic lexical database

[...]

Christiane Fellbaum

01 Sep 2000-Language

TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.

...read moreread less

Abstract: Part 1 The lexical database: nouns in WordNet, George A. Miller modifiers in WordNet, Katherine J. Miller a semantic network of English verbs, Christiane Fellbaum design and implementation of the WordNet lexical database and searching software, Randee I. Tengi. Part 2: automated discovery of WordNet relations, Marti A. Hearst representing verb alterations in WordNet, Karen T. Kohl et al the formalization of WordNet by methods of relational concept analysis, Uta E. Priss. Part 3 Applications of WordNet: building semantic concordances, Shari Landes et al performance and confidence in a semantic annotation task, Christiane Fellbaum et al WordNet and class-based probabilities, Philip Resnik combining local context and WordNet similarity for word sense identification, Claudia Leacock and Martin Chodorow using WordNet for text retrieval, Ellen M. Voorhees lexical chains as representations of context for the detection and correction of malapropisms, Graeme Hirst and David St-Onge temporal indexing through lexical chaining, Reem Al-Halimi and Rick Kazman COLOR-X - using knowledge from WordNet for conceptual modelling, J.F.M. Burg and R.P. van de Riet knowledge processing on an extended WordNet, Sanda M. Harabagiu and Dan I Moldovan appendix - obtaining and using WordNet.

...read moreread less

13,049 citations

Additional excerpts

...0 (Fellbaum, 1998)....
[...]

Journal Article•DOI•

Numerical Recipes, The Art of Scientific Computing

[...]

William H. Press, Brian P. Flannery, Saul A. Teukolsky, William T. Vetterling, Harvey Gould - Show less +1 more

01 Jan 1987-American Journal of Physics

12,617 citations

Proceedings Article•DOI•

The Berkeley FrameNet Project

[...]

Collin F. Baker¹, Charles J. Fillmore¹, John B. Lowe¹•Institutions (1)

International Computer Science Institute¹

10 Aug 1998

TL;DR: This report will present the project's goals and workflow, and information about the computational tools that have been adapted or created in-house for this work.

...read moreread less

Abstract: FrameNet is a three-year NSF-supported project in corpus-based computational lexicography, now in its second year (NSF IRI-9618838, "Tools for Lexicon Building"). The project's key features are (a) a commitment to corpus evidence for semantic and syntactic generalizations, and (b) the representation of the valences of its target words (mostly nouns, adjectives, and verbs) in which the semantic portion makes use of frame semantics. The resulting database will contain (a) descriptions of the semantic frames underlying the meanings of the words described, and (b) the valence representation (semantic and syntactic) of several thousand words and phrases, each accompanied by (c) a representative collection of annotated corpus attestations, which jointly exemplify the observed linkings between "frame elements" and their syntactic realizations (e.g. grammatical function, phrase type, and other syntactic traits). This report will present the project's goals and workflow, and information about the computational tools that have been adapted or created in-house for this work.

...read moreread less

2,900 citations

"*SEM 2013 shared task: Semantic Tex..." refers methods in this paper

...The FnWN subset has 189 manually mapped pairs of senses from FrameNet 1.5 (Baker et al., 1998) to WordNet 3.1....
[...]
...5 (Baker et al., 1998) to WordNet 3....
[...]

Proceedings Article•

A Study of Translation Edit Rate with Targeted Human Annotation

[...]

Matthew Snover¹, Bonnie J. Dorr¹, Richard Schwartz², Linnea Micciulla², John Makhoul² - Show less +1 more•Institutions (2)

University of Maryland, College Park¹, BBN Technologies²

08 Aug 2006

TL;DR: A new, intuitive measure for evaluating machine translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments is defined.

...read moreread less

Abstract: We examine a new, intuitive measure for evaluating machine-translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments. Translation Edit Rate (TER) measures the amount of editing that a human would have to perform to change a system output so it exactly matches a reference translation. We show that the single-reference variant of TER correlates as well with human judgments of MT quality as the four-reference variant of BLEU. We also define a human-targeted TER (or HTER) and show that it yields higher correlations with human judgments than BLEU—even when BLEU is given human-targeted references. Our results indicate that HTER correlates with human judgments better than HMETEOR and that the four-reference variants of TER and HTER correlate with human judgments as well as—or better than—a second human judgment does.

...read moreread less

2,210 citations

"*SEM 2013 shared task: Semantic Tex..." refers methods in this paper

...Both metrics use the TER metric (Snover et al., 2006) to measure the similarity of pairs....
[...]

Proceedings Article•DOI•

OntoNotes: The 90% Solution

[...]

Eduard Hovy, Mitchell Marcus¹, Martha Palmer, Lance Ramshaw², Ralph Weischedel² - Show less +1 more•Institutions (2)

University of Pennsylvania¹, BBN Technologies²

04 Jun 2006

TL;DR: It is described the OntoNotes methodology and its result, a large multilingual richly-annotated corpus constructed at 90% interannotator agreement, which will be made available to the community during 2007.

...read moreread less

Abstract: We describe the OntoNotes methodology and its result, a large multilingual richly-annotated corpus constructed at 90% interannotator agreement. An initial portion (300K words of English newswire and 250K words of Chinese newswire) will be made available to the community during 2007.

...read moreread less

931 citations

"*SEM 2013 shared task: Semantic Tex..." refers methods in this paper

...The OnWN subset comprises 561 gloss pairs from OntoNotes 4.0 (Hovy et al., 2006) and WordNet 3.0 (Fellbaum, 1998)....
[...]
...We are grateful to the OntoNotes team for sharing OntoNotes to WordNet mappings (Hovy et al. 2006)....
[...]