scispace - formally typeset
Open accessJournal ArticleDOI: 10.1162/TACL_A_00356

Augmenting Transformers with KNN-Based Composite Memory for Dialog

04 Mar 2021-Transactions of the Association for Computational Linguistics (MIT Press - Journals)-Vol. 9, pp 82-99
Abstract: Various machine learning tasks can benefit from access to external information of different modalities, such as text and images. Recent work has focused on learning architectures with large memories capable of storing this knowledge. We propose augmenting generative Transformer neural networks with KNN-based Information Fetching (KIF) modules. Each KIF module learns a read operation to access fixed external knowledge. We apply these modules to generative dialog modeling, a challenging task where information must be flexibly retrieved and incorporated to maintain the topic and flow of conversation. We demonstrate the effectiveness of our approach by identifying relevant knowledge required for knowledgeable but engaging dialog from Wikipedia, images, and human-written dialog utterances, and show that leveraging this retrieved information improves model performance, measured by automatic and human evaluation.

... read more

Topics: Dialog box (54%)
Citations
  More

6 results found


Open accessPosted Content
Kurt Shuster1, Spencer Poff1, Moya Chen1, Douwe Kiela1  +1 moreInstitutions (1)
Abstract: Despite showing increasingly human-like conversational abilities, state-of-the-art dialogue models often suffer from factual incorrectness and hallucination of knowledge (Roller et al., 2020). In this work we explore the use of neural-retrieval-in-the-loop architectures - recently shown to be effective in open-domain QA (Lewis et al., 2020b; Izacard and Grave, 2020) - for knowledge-grounded dialogue, a task that is arguably more challenging as it requires querying based on complex multi-turn dialogue context and generating conversationally coherent responses. We study various types of architectures with multiple components - retrievers, rankers, and encoder-decoders - with the goal of maximizing knowledgeability while retaining conversational ability. We demonstrate that our best models obtain state-of-the-art performance on two knowledge-grounded conversational tasks. The models exhibit open-domain conversational capabilities, generalize effectively to scenarios not within the training data, and, as verified by human evaluations, substantially reduce the well-known problem of knowledge hallucination in state-of-the-art chatbots.

... read more

Topics: Conversation (54%), Context (language use) (53%)

20 Citations


Open accessProceedings Article
Kurt Shuster1, Spencer Poff1, Moya Chen1, Douwe Kiela1  +1 moreInstitutions (1)
15 Apr 2021-
Abstract: Despite showing increasingly human-like conversational abilities, state-of-the-art dialogue models often suffer from factual incorrectness and hallucination of knowledge (Roller et al., 2020). In this work we explore the use of neural-retrieval-in-the-loop architectures - recently shown to be effective in open-domain QA (Lewis et al., 2020b; Izacard and Grave, 2020) - for knowledge-grounded dialogue, a task that is arguably more challenging as it requires querying based on complex multi-turn dialogue context and generating conversationally coherent responses. We study various types of architectures with multiple components - retrievers, rankers, and encoder-decoders - with the goal of maximizing knowledgeability while retaining conversational ability. We demonstrate that our best models obtain state-of-the-art performance on two knowledge-grounded conversational tasks. The models exhibit open-domain conversational capabilities, generalize effectively to scenarios not within the training data, and, as verified by human evaluations, substantially reduce the well-known problem of knowledge hallucination in state-of-the-art chatbots.

... read more

Topics: Conversation (54%), Context (language use) (53%)

5 Citations


Open accessPosted Content
Abstract: We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectation-maximization algorithm. We iteratively estimate the value of our latent variable (the set of relevant documents for a given question) and then use this estimate to update the retriever and reader parameters. We hypothesize that such end-to-end training allows training signals to flow to the reader and then to the retriever better than staged-wise training. This results in a retriever that is able to select more relevant documents for a question and a reader that is trained on more accurate documents to generate an answer. Experiments on three benchmark datasets demonstrate that our proposed method outperforms all existing approaches of comparable size by 2-3% absolute exact match points, achieving new state-of-the-art results. Our results also demonstrate the feasibility of learning to retrieve to improve answer generation without explicit supervision of retrieval decisions.

... read more

Topics: Question answering (55%)

3 Citations


Journal ArticleDOI: 10.1145/3464377
MaLongxuan1, LiMingda1, ZhangWei-Nan1, LiJiapeng1  +1 moreInstitutions (1)
Abstract: Incorporating external knowledge into dialogue generation has been proven to benefit the performance of an open-domain Dialogue System (DS), such as generating informative or stylized responses, co...

... read more

Topics: Stylized fact (54%)

2 Citations


Open access
01 Nov 2021-
Abstract: Large-scale conversation models are turning to leveraging external knowledge to improve the factual accuracy in response generation. Considering the infeasibility to annotate the external knowledge for large-scale dialogue corpora, it is desirable to learn the knowledge selection and response generation in an unsupervised manner. In this paper, we propose PLATO-KAG (Knowledge-Augmented Generation), an unsupervised learning approach for end-to-end knowledge-grounded conversation modeling. For each dialogue context, the top-k relevant knowledge elements are selected and then employed in knowledge-grounded response generation. The two components of knowledge selection and response generation are optimized jointly and effectively under a balanced objective. Experimental results on two publicly available datasets validate the superiority of PLATO-KAG.

... read more


References
  More

47 results found


Open accessProceedings Article
Diederik P. Kingma1, Jimmy Ba2Institutions (2)
01 Jan 2015-
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

... read more

Topics: Stochastic optimization (63%), Convex optimization (54%), Rate of convergence (52%) ... show more

78,539 Citations


Open accessProceedings Article
Ashish Vaswani1, Noam Shazeer1, Niki Parmar2, Jakob Uszkoreit1  +4 moreInstitutions (2)
12 Jun 2017-
Abstract: The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.

... read more

Topics: Machine translation (58%), Encoder (52%), BLEU (51%) ... show more

21,996 Citations


Open accessProceedings ArticleDOI: 10.1109/CVPR.2017.634
Saining Xie1, Ross Girshick2, Piotr Dollár2, Zhuowen Tu1  +1 moreInstitutions (2)
21 Jul 2017-
Abstract: We present a simple, highly modularized network architecture for image classification. Our network is constructed by repeating a building block that aggregates a set of transformations with the same topology. Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call cardinality (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width. On the ImageNet-1K dataset, we empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy. Moreover, increasing cardinality is more effective than going deeper or wider when we increase the capacity. Our models, named ResNeXt, are the foundations of our entry to the ILSVRC 2016 classification task in which we secured 2nd place. We further investigate ResNeXt on an ImageNet-5K set and the COCO detection set, also showing better results than its ResNet counterpart. The code and models are publicly available online.

... read more

Topics: Cardinality (61%), Dimension (vector space) (54%), Set (abstract data type) (53%) ... show more

5,343 Citations


Open accessProceedings ArticleDOI: 10.18653/V1/P16-1162
12 Aug 2016-
Abstract: Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem. Previous work addresses the translation of out-of-vocabulary words by backing off to a dictionary. In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units. This is based on the intuition that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). We discuss the suitability of different word segmentation techniques, including simple character ngram models and a segmentation based on the byte pair encoding compression algorithm, and empirically show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English!German and English!Russian by up to 1.1 and 1.3 BLEU, respectively.

... read more

5,164 Citations


Open accessPosted Content
Alex Graves1, Greg Wayne1, Ivo Danihelka1Institutions (1)
Abstract: We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.

... read more

Topics: Turing machine (69%), Artificial neural network (61%), Von Neumann architecture (57%) ... show more

1,323 Citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20221
20215
Network Information
Related Papers (5)
Lightly Supervised Learning of Procedural Dialog Systems01 Jan 2013

Svitlana Volkova, Pallavi Choudhury +3 more

76% related
A Proposal for the Development of Lifelong Dialog Systems13 May 2019

David Griol, Araceli Sanchis +1 more

75% related
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data.24 Jul 2020

Michael Cogswell, Jiasen Lu +4 more

74% related
Cognitive Attention Network (CAN) for Text and Image Multimodal Visual Dialog Systems05 Nov 2020

Obinna Agbodike, Chiao-Hua Huang +1 more

74% related
Learning Task Knowledge from Dialog and Web Access17 Jun 2015, Robotics

Vittorio Perera, Robin Soetens +6 more

73% related