scispace - formally typeset
Open AccessBook ChapterDOI

Topic-Grained Text Representation-Based Model for Document Retrieval

Reads0
Chats0
TLDR
In this article , a Topic-Grained Text Representation-based Model for Document Retrieval (TGTR) is proposed to reduce the storage requirements by using novel topic-grained representations.
Abstract
Document retrieval enables users to find their required documents accurately and quickly. To satisfy the requirement of retrieval efficiency, prevalent deep neural methods adopt a representation-based matching paradigm, which saves online matching time by pre-storing document representations offline. However, the above paradigm consumes vast local storage space, especially when storing the document as word-grained representations. To tackle this, we present TGTR, a Topic-Grained Text Representation-based Model for document retrieval. Following the representation-based matching paradigm, TGTR stores the document representations offline to ensure retrieval efficiency, whereas it significantly reduces the storage requirements by using novel topic-grained representations rather than traditional word-grained. Experimental results demonstrate that compared to word-grained baselines, TGTR is consistently competitive with them on TREC CAR and MS MARCO in terms of retrieval accuracy, but it requires less than 1/10 of the storage space required by them. Moreover, TGTR overwhelmingly surpasses global-grained baselines in terms of retrieval accuracy.

read more

Content maybe subject to copyright    Report

References
More filters
Proceedings ArticleDOI

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

TL;DR: Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.
Proceedings ArticleDOI

A Deep Relevance Matching Model for Ad-hoc Retrieval

TL;DR: A novel deep relevance matching model (DRMM) for ad-hoc retrieval that employs a joint deep architecture at the query term level for relevance matching and can significantly outperform some well-known retrieval models as well as state-of-the-art deep matching models.
Proceedings ArticleDOI

Dense Passage Retrieval for Open-Domain Question Answering

TL;DR: In this paper, a dual-encoder framework is proposed to learn dense representations from a small number of questions and passages by a simple dual encoder framework, which outperforms a strong Lucene-BM25 system greatly.
Proceedings ArticleDOI

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

TL;DR: ColBERT is presented, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval that is competitive with existing BERT-based models (and outperforms every non-BERT baseline) and enables leveraging vector-similarity indexes for end-to-end retrieval directly from millions of documents.
Proceedings ArticleDOI

RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering

TL;DR: This work proposes an optimized training approach, called RocketQA, to improving dense passage retrieval, which significantly outperforms previous state-of-the-art models on both MSMARCO and Natural Questions and demonstrates that the performance of end-to-end QA can be improved based on theRocketQA retriever.