Topic-Grained Text Representation-Based Model for Document Retrieval
Suzan Uysal,Ahmed Hazazi +1 more
- pp 776-788
Reads0
Chats0
TLDR
In this article , a Topic-Grained Text Representation-based Model for Document Retrieval (TGTR) is proposed to reduce the storage requirements by using novel topic-grained representations.Abstract:
Document retrieval enables users to find their required documents accurately and quickly. To satisfy the requirement of retrieval efficiency, prevalent deep neural methods adopt a representation-based matching paradigm, which saves online matching time by pre-storing document representations offline. However, the above paradigm consumes vast local storage space, especially when storing the document as word-grained representations. To tackle this, we present TGTR, a Topic-Grained Text Representation-based Model for document retrieval. Following the representation-based matching paradigm, TGTR stores the document representations offline to ensure retrieval efficiency, whereas it significantly reduces the storage requirements by using novel topic-grained representations rather than traditional word-grained. Experimental results demonstrate that compared to word-grained baselines, TGTR is consistently competitive with them on TREC CAR and MS MARCO in terms of retrieval accuracy, but it requires less than 1/10 of the storage space required by them. Moreover, TGTR overwhelmingly surpasses global-grained baselines in terms of retrieval accuracy. read more
References
More filters
Proceedings ArticleDOI
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers,Iryna Gurevych +1 more
TL;DR: Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.
Proceedings ArticleDOI
A Deep Relevance Matching Model for Ad-hoc Retrieval
TL;DR: A novel deep relevance matching model (DRMM) for ad-hoc retrieval that employs a joint deep architecture at the query term level for relevance matching and can significantly outperform some well-known retrieval models as well as state-of-the-art deep matching models.
Proceedings ArticleDOI
Dense Passage Retrieval for Open-Domain Question Answering
Vladimir Karpukhin,Barlas Oguz,Sewon Min,Patrick S. H. Lewis,Ledell Wu,Sergey Edunov,Danqi Chen,Wen-tau Yih +7 more
TL;DR: In this paper, a dual-encoder framework is proposed to learn dense representations from a small number of questions and passages by a simple dual encoder framework, which outperforms a strong Lucene-BM25 system greatly.
Proceedings ArticleDOI
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Omar Khattab,Matei Zaharia +1 more
TL;DR: ColBERT is presented, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval that is competitive with existing BERT-based models (and outperforms every non-BERT baseline) and enables leveraging vector-similarity indexes for end-to-end retrieval directly from millions of documents.
Proceedings ArticleDOI
RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering
Yingqi Qu,Ding Yuchen,Jing Liu,Kai Liu,Ruiyang Ren,Wayne Xin Zhao,Daxiang Dong,Hua Wu,Haifeng Wang +8 more
TL;DR: This work proposes an optimized training approach, called RocketQA, to improving dense passage retrieval, which significantly outperforms previous state-of-the-art models on both MSMARCO and Natural Questions and demonstrates that the performance of end-to-end QA can be improved based on theRocketQA retriever.