Curriculum Learning for Dense Retrieval Distillation
Hansi Zeng,Hamed Zamani,V. Vinay +2 more
TLDR
A generic curriculum learning based optimization framework called CL-DRD that controls the difficulty level of training data produced by the re-ranking (teacher) model is proposed that iteratively optimizes the dense retrieval (student) model by increasing the difficulty of the knowledge distillation data made available to it.Abstract:
Recent work has shown that more effective dense retrieval models can be obtained by distilling ranking knowledge from an existing base re-ranking model. In this paper, we propose a generic curriculum learning based optimization framework called CL-DRD that controls the difficulty level of training data produced by the re-ranking (teacher) model. CL-DRD iteratively optimizes the dense retrieval (student) model by increasing the difficulty of the knowledge distillation data made available to it. In more detail, we initially provide the student model coarse-grained preference pairs between documents in the teacher's ranking, and progressively move towards finer-grained pairwise document ordering requirements. In our experiments, we apply a simple implementation of the CL-DRD framework to enhance two state-of-the-art dense retrieval models. Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.read more
Citations
More filters
Proceedings ArticleDOI
PLAID: An Efficient Engine for Late Interaction Retrieval
TL;DR: The Performance- Optimized Late Interaction Driver (PLAID) engine is introduced, which uses centroid interaction as well as centroid pruning, a mechanism for sparsifying the bag of centroids, within a highly-optimized engine to reduce late interaction search latency by up to 7x on a GPU and 45x on an CPU against vanilla ColBERTv2.
Journal ArticleDOI
Dense Text Retrieval based on Pretrained Language Models: A Survey
TL;DR: A comprehensive survey on dense text retrieval can be found in this article , where the authors provide a comprehensive, practical reference focused on the major progress for dense text this article . But, their focus is on the relevance matching.
Journal ArticleDOI
Low-Resource Dense Retrieval for Open-Domain Question Answering: A Comprehensive Survey
Xiaoyu Shen,Svitlana Vakulenko,Marco Del Tredici,Gianni Barlacchi,Bill Byrne,Adrià de Gispert +5 more
TL;DR: A thorough structured overview of mainstream techniques for low-resource DR, dividing the techniques into three main categories based on their required resources, and highlighting the open issues and pros and cons.
Proceedings ArticleDOI
PROD: Progressive Distillation for Dense Retrieval
Yeyun Gong,Xiao Liu,Hang Zhang,Chen Lin,Anlei Dong,Jian Jiao,Jingwen Lu,Daxin Jiang,Rangan Majumder,Nan Duan +9 more
TL;DR: This work proposes PROD, a PRO gressive distillation method, for dense retrieval, which consists of a teacher progressive distillation and a datagressive distillation to gradually improve the student.
Journal ArticleDOI
Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval
TL;DR: This work demonstrates that MLM pre-trained transformers can be used to effectively encode text information into a single-vector for dense retrieval.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Journal ArticleDOI
A vector space model for automatic indexing
Gerard Salton,A. Wong,C. S. Yang +2 more
TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.
Proceedings ArticleDOI
Curriculum learning
TL;DR: It is hypothesized that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions).
Posted Content
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
TL;DR: This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses.