Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

doi:10.18653/V1/D19-1410

Open AccessProceedings ArticleDOI

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Nils Reimers, +1 more

- pp 3980-3990

Chats0

TLDR

Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.

Abstract:

BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT. We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep Learning--based Text Classification: A Comprehensive Review

Shervin Minaee, +5 more

- 17 Apr 2021 -

ACM Computing Surveys

TL;DR: This paper provided a comprehensive review of more than 150 deep learning-based models for text classification developed in recent years, and discussed their technical contributions, similarities, and strengths, and provided a quantitative analysis of the performance of different deep learning models on popular benchmarks.

...read moreread less

Posted Content

Language-agnostic BERT Sentence Embedding

Fangxiaoyu Feng, +4 more

- 03 Jul 2020 -

arXiv: Computation and Language

TL;DR: It is shown that introducing a pre-trained multilingual language model dramatically reduces the amount of parallel training data required to achieve good performance by 80%, and a model that achieves 83.7% bi-text retrieval accuracy over 112 languages on Tatoeba is released.

...read moreread less

Posted Content

Deep Learning Based Text Classification: A Comprehensive Review

Shervin Minaee, +5 more

- 06 Apr 2020 -

arXiv: Computation and Language

TL;DR: A comprehensive review of more than 150 deep learning--based models for text classification developed in recent years is provided, and their technical contributions, similarities, and strengths are discussed.

...read moreread less

Posted Content

Sparse, Dense, and Attentional Representations for Text Retrieval

Yi Luan, +3 more

- 01 May 2020 -

arXiv: Computation and Language

TL;DR: A simple neural model is proposed that combines the efficiency of dual encoders with some of the expressiveness of more costly attentional architectures, and is explored to explore sparse-dense hybrids to capitalize on the precision of sparse retrieval.

...read moreread less

Proceedings Article

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Wenrong Huang, +3 more

TL;DR: This paper investigates the possibility of grounding high-level tasks, expressed in natural language, to a chosen set of actionable steps and proposes a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, +9 more

- 26 Jul 2019 -

arXiv: Computation and Language

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

...read moreread less

Proceedings ArticleDOI

FaceNet: A unified embedding for face recognition and clustering

Florian Schroff, +2 more

TL;DR: A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.

...read moreread less

Collapse

arXiv: Computation and Language

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Citations

Deep Learning--based Text Classification: A Comprehensive Review

Language-agnostic BERT Sentence Embedding

Deep Learning Based Text Classification: A Comprehensive Review

Sparse, Dense, and Attentional Representations for Text Retrieval

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

References

Attention is All you Need

Glove: Global Vectors for Word Representation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

RoBERTa: A Robustly Optimized BERT Pretraining Approach

FaceNet: A unified embedding for face recognition and clustering

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Glove: Global Vectors for Word Representation

Attention is All you Need

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Distributed Representations of Words and Phrases and their Compositionality