scispace - formally typeset
Open AccessProceedings ArticleDOI

Curriculum Learning for Dense Retrieval Distillation

TLDR
A generic curriculum learning based optimization framework called CL-DRD that controls the difficulty level of training data produced by the re-ranking (teacher) model is proposed that iteratively optimizes the dense retrieval (student) model by increasing the difficulty of the knowledge distillation data made available to it.
Abstract
Recent work has shown that more effective dense retrieval models can be obtained by distilling ranking knowledge from an existing base re-ranking model. In this paper, we propose a generic curriculum learning based optimization framework called CL-DRD that controls the difficulty level of training data produced by the re-ranking (teacher) model. CL-DRD iteratively optimizes the dense retrieval (student) model by increasing the difficulty of the knowledge distillation data made available to it. In more detail, we initially provide the student model coarse-grained preference pairs between documents in the teacher's ranking, and progressively move towards finer-grained pairwise document ordering requirements. In our experiments, we apply a simple implementation of the CL-DRD framework to enhance two state-of-the-art dense retrieval models. Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

PLAID: An Efficient Engine for Late Interaction Retrieval

TL;DR: The Performance- Optimized Late Interaction Driver (PLAID) engine is introduced, which uses centroid interaction as well as centroid pruning, a mechanism for sparsifying the bag of centroids, within a highly-optimized engine to reduce late interaction search latency by up to 7x on a GPU and 45x on an CPU against vanilla ColBERTv2.
Journal ArticleDOI

Dense Text Retrieval based on Pretrained Language Models: A Survey

TL;DR: A comprehensive survey on dense text retrieval can be found in this article , where the authors provide a comprehensive, practical reference focused on the major progress for dense text this article . But, their focus is on the relevance matching.
Journal ArticleDOI

Low-Resource Dense Retrieval for Open-Domain Question Answering: A Comprehensive Survey

TL;DR: A thorough structured overview of mainstream techniques for low-resource DR, dividing the techniques into three main categories based on their required resources, and highlighting the open issues and pros and cons.
Proceedings ArticleDOI

PROD: Progressive Distillation for Dense Retrieval

TL;DR: This work proposes PROD, a PRO gressive distillation method, for dense retrieval, which consists of a teacher progressive distillation and a datagressive distillation to gradually improve the student.
Journal ArticleDOI

Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval

TL;DR: This work demonstrates that MLM pre-trained transformers can be used to effectively encode text information into a single-vector for dense retrieval.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Journal ArticleDOI

A vector space model for automatic indexing

TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.
Proceedings ArticleDOI

Curriculum learning

TL;DR: It is hypothesized that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions).
Posted Content

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

TL;DR: This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses.
Related Papers (5)