Topic-Grained Text Representation-Based Model for Document Retrieval

doi:10.1007/978-3-031-15934-3_64

Open AccessBook ChapterDOI

Topic-Grained Text Representation-Based Model for Document Retrieval

Suzan Uysal, +1 more

- pp 776-788

Chats0

TLDR

In this article , a Topic-Grained Text Representation-based Model for Document Retrieval (TGTR) is proposed to reduce the storage requirements by using novel topic-grained representations.

Abstract:

Document retrieval enables users to find their required documents accurately and quickly. To satisfy the requirement of retrieval efficiency, prevalent deep neural methods adopt a representation-based matching paradigm, which saves online matching time by pre-storing document representations offline. However, the above paradigm consumes vast local storage space, especially when storing the document as word-grained representations. To tackle this, we present TGTR, a Topic-Grained Text Representation-based Model for document retrieval. Following the representation-based matching paradigm, TGTR stores the document representations offline to ensure retrieval efficiency, whereas it significantly reduces the storage requirements by using novel topic-grained representations rather than traditional word-grained. Experimental results demonstrate that compared to word-grained baselines, TGTR is consistently competitive with them on TREC CAR and MS MARCO in terms of retrieval accuracy, but it requires less than 1/10 of the storage space required by them. Moreover, TGTR overwhelmingly surpasses global-grained baselines in terms of retrieval accuracy.

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Nils Reimers, +1 more

TL;DR: Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.

...read moreread less

Proceedings ArticleDOI

A Deep Relevance Matching Model for Ad-hoc Retrieval

Jiafeng Guo, +3 more

TL;DR: A novel deep relevance matching model (DRMM) for ad-hoc retrieval that employs a joint deep architecture at the query term level for relevance matching and can significantly outperform some well-known retrieval models as well as state-of-the-art deep matching models.

...read moreread less

Proceedings ArticleDOI

Dense Passage Retrieval for Open-Domain Question Answering

Vladimir Karpukhin, +7 more

TL;DR: In this paper, a dual-encoder framework is proposed to learn dense representations from a small number of questions and passages by a simple dual encoder framework, which outperforms a strong Lucene-BM25 system greatly.

...read moreread less

Proceedings ArticleDOI

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

Omar Khattab, +1 more

TL;DR: ColBERT is presented, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval that is competitive with existing BERT-based models (and outperforms every non-BERT baseline) and enables leveraging vector-similarity indexes for end-to-end retrieval directly from millions of documents.

...read moreread less

Proceedings ArticleDOI

RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering

Yingqi Qu, +8 more

TL;DR: This work proposes an optimized training approach, called RocketQA, to improving dense passage retrieval, which significantly outperforms previous state-of-the-art models on both MSMARCO and Natural Questions and demonstrates that the performance of end-to-end QA can be improved based on theRocketQA retriever.

...read moreread less

Topic-Grained Text Representation-Based Model for Document Retrieval

References

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

A Deep Relevance Matching Model for Ad-hoc Retrieval

Dense Passage Retrieval for Open-Domain Question Answering

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering

Related Papers (5)

A construction method of a retrieval space by an evaluation of word distributions in documents

Space-Efficient top-k document retrieval

Desiderata for Vector-Space Word Representations.

Image-based document vectors for text retrieval

Improving a basic retrieval method by links and passage level evvidence