scispace - formally typeset
Search or ask a question
Proceedings Article

*SEM 2013 shared task: Semantic Textual Similarity

TL;DR: The CORE task attracted 34 participants with 89 runs, and the TYPED task attracted 6 teams with 14 runs, with relative high interannotator correlation, ranging from 62% to 87%.
Abstract: In Semantic Textual Similarity (STS), systems rate the degree of semantic equivalence, on a graded scale from 0 to 5, with 5 being the most similar. This year we set up two tasks: (i) a core task (CORE), and (ii) a typed-similarity task (TYPED). CORE is similar in set up to SemEval STS 2012 task with pairs of sentences from sources related to those of 2012, yet different in genre from the 2012 set, namely, this year we included newswire headlines, machine translation evaluation datasets and multiple lexical resource glossed sets. TYPED, on the other hand, is novel and tries to characterize why two items are deemed similar, using cultural heritage items which are described with metadata such as title, author or description. Several types of similarity have been defined, including similar author, similar time period or similar location. The annotation for both tasks leverages crowdsourcing, with relative high interannotator correlation, ranging from 62% to 87%. The CORE task attracted 34 participants with 89 runs, and the TYPED task attracted 6 teams with 14 runs.
Citations
More filters
Proceedings ArticleDOI
14 Aug 2019
TL;DR: Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.
Abstract: BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT. We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods.

4,020 citations


Cites methods from "*SEM 2013 shared task: Semantic Tex..."

  • ...We use the STS tasks 2012 - 2016 (Agirre et al., 2012, 2013, 2014, 2015, 2016), the STS benchmark (Cer et al., 2017), and the SICK-Relatedness dataset (Marelli et al., 2014)....

    [...]

Proceedings ArticleDOI
TL;DR: The STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017), providing insight into the limitations of existing models.
Abstract: Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art. The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participating in all language tracks. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To support ongoing work on semantic representations, the STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017).

1,124 citations


Cites methods from "*SEM 2013 shared task: Semantic Tex..."

  • ...To encourage and support research in this area, the STS shared task has been held annually since 2012, providing a venue for evaluation of state-ofthe-art algorithms and models (Agirre et al., 2012, 2013, 2014, 2015, 2016)....

    [...]

Proceedings ArticleDOI
01 Jan 2017
TL;DR: The Semantic Textual Similarity (STS) shared task as discussed by the authors was the first task for assessing the state-of-the-art machine translation systems. But the task focused on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE).
Abstract: Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art. The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participating in all language tracks. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To support ongoing work on semantic representations, the STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017).

929 citations

Proceedings Article
12 Feb 2016
TL;DR: A siamese adaptation of the Long Short-Term Memory network for labeled data comprised of pairs of variable-length sequences is presented, which compel the sentence representations learned by the model to form a highly structured space whose geometry reflects complex semantic relationships.
Abstract: We present a siamese adaptation of the Long Short-Term Memory (LSTM) network for labeled data comprised of pairs of variable-length sequences. Our model is applied to assess semantic similarity between sentences, where we exceed state of the art, outperforming carefully handcrafted features and recently proposed neural network systems of greater complexity. For these applications, we provide word-embedding vectors supplemented with synonymic information to the LSTMs, which use a fixed size vector to encode the underlying meaning expressed in a sentence (irrespective of the particular wording/syntax). By restricting subsequent operations to rely on a simple Manhattan metric, we compel the sentence representations learned by our model to form a highly structured space whose geometry reflects complex semantic relationships. Our results are the latest in a line of findings that showcase LSTMs as powerful language models capable of tasks requiring intricate understanding.

839 citations


Cites methods from "*SEM 2013 shared task: Semantic Tex..."

  • ...Then, our MaLSTM is (pre)trained as previously described on separate sentence-pair data provided for the earlier SemEval 2013 Semantic Textual Similarity task (Agirre and Cer 2013)....

    [...]

Proceedings ArticleDOI
01 Jan 2016
TL;DR: Comunicacio presentada al 10th International Workshop on Semantic Evaluation (SemEval-2016), celebrat els dies 16 i 17 de juny de 2016 a San Diego, California.
Abstract: Comunicacio presentada al 10th International Workshop on Semantic Evaluation (SemEval-2016), celebrat els dies 16 i 17 de juny de 2016 a San Diego, California.

529 citations


Cites methods from "*SEM 2013 shared task: Semantic Tex..."

  • ...The STS shared task has been held annually since 2012, providing a venue for the evaluation of state-of-the-art algorithms and models (Agirre et al., 2012; Agirre et al., 2013; Agirre et al., 2014; Agirre et al., 2015)....

    [...]

References
More filters
Journal ArticleDOI
01 Sep 2000-Language
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
Abstract: Part 1 The lexical database: nouns in WordNet, George A. Miller modifiers in WordNet, Katherine J. Miller a semantic network of English verbs, Christiane Fellbaum design and implementation of the WordNet lexical database and searching software, Randee I. Tengi. Part 2: automated discovery of WordNet relations, Marti A. Hearst representing verb alterations in WordNet, Karen T. Kohl et al the formalization of WordNet by methods of relational concept analysis, Uta E. Priss. Part 3 Applications of WordNet: building semantic concordances, Shari Landes et al performance and confidence in a semantic annotation task, Christiane Fellbaum et al WordNet and class-based probabilities, Philip Resnik combining local context and WordNet similarity for word sense identification, Claudia Leacock and Martin Chodorow using WordNet for text retrieval, Ellen M. Voorhees lexical chains as representations of context for the detection and correction of malapropisms, Graeme Hirst and David St-Onge temporal indexing through lexical chaining, Reem Al-Halimi and Rick Kazman COLOR-X - using knowledge from WordNet for conceptual modelling, J.F.M. Burg and R.P. van de Riet knowledge processing on an extended WordNet, Sanda M. Harabagiu and Dan I Moldovan appendix - obtaining and using WordNet.

13,049 citations


Additional excerpts

  • ...0 (Fellbaum, 1998)....

    [...]

Proceedings ArticleDOI
10 Aug 1998
TL;DR: This report will present the project's goals and workflow, and information about the computational tools that have been adapted or created in-house for this work.
Abstract: FrameNet is a three-year NSF-supported project in corpus-based computational lexicography, now in its second year (NSF IRI-9618838, "Tools for Lexicon Building"). The project's key features are (a) a commitment to corpus evidence for semantic and syntactic generalizations, and (b) the representation of the valences of its target words (mostly nouns, adjectives, and verbs) in which the semantic portion makes use of frame semantics. The resulting database will contain (a) descriptions of the semantic frames underlying the meanings of the words described, and (b) the valence representation (semantic and syntactic) of several thousand words and phrases, each accompanied by (c) a representative collection of annotated corpus attestations, which jointly exemplify the observed linkings between "frame elements" and their syntactic realizations (e.g. grammatical function, phrase type, and other syntactic traits). This report will present the project's goals and workflow, and information about the computational tools that have been adapted or created in-house for this work.

2,900 citations


"*SEM 2013 shared task: Semantic Tex..." refers methods in this paper

  • ...The FnWN subset has 189 manually mapped pairs of senses from FrameNet 1.5 (Baker et al., 1998) to WordNet 3.1....

    [...]

  • ...5 (Baker et al., 1998) to WordNet 3....

    [...]

Proceedings Article
08 Aug 2006
TL;DR: A new, intuitive measure for evaluating machine translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments is defined.
Abstract: We examine a new, intuitive measure for evaluating machine-translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments. Translation Edit Rate (TER) measures the amount of editing that a human would have to perform to change a system output so it exactly matches a reference translation. We show that the single-reference variant of TER correlates as well with human judgments of MT quality as the four-reference variant of BLEU. We also define a human-targeted TER (or HTER) and show that it yields higher correlations with human judgments than BLEU—even when BLEU is given human-targeted references. Our results indicate that HTER correlates with human judgments better than HMETEOR and that the four-reference variants of TER and HTER correlate with human judgments as well as—or better than—a second human judgment does.

2,210 citations


"*SEM 2013 shared task: Semantic Tex..." refers methods in this paper

  • ...Both metrics use the TER metric (Snover et al., 2006) to measure the similarity of pairs....

    [...]

Proceedings ArticleDOI
04 Jun 2006
TL;DR: It is described the OntoNotes methodology and its result, a large multilingual richly-annotated corpus constructed at 90% interannotator agreement, which will be made available to the community during 2007.
Abstract: We describe the OntoNotes methodology and its result, a large multilingual richly-annotated corpus constructed at 90% interannotator agreement. An initial portion (300K words of English newswire and 250K words of Chinese newswire) will be made available to the community during 2007.

931 citations


"*SEM 2013 shared task: Semantic Tex..." refers methods in this paper

  • ...The OnWN subset comprises 561 gloss pairs from OntoNotes 4.0 (Hovy et al., 2006) and WordNet 3.0 (Fellbaum, 1998)....

    [...]

  • ...We are grateful to the OntoNotes team for sharing OntoNotes to WordNet mappings (Hovy et al. 2006)....

    [...]