PLATO-KAG: Unsupervised Knowledge-Grounded Conversation via Joint Modeling

Home
/
Papers
/
PLATO-KAG: Unsupervised Knowledge-Grounded Conversation via Joint Modeling

PLATO-KAG: Unsupervised Knowledge-Grounded Conversation via Joint Modeling

Xinxian Huang, Huang He, Siqi Bao, Fan Wang, Hua Wu, Haifeng Wang - Show less +2 more

01 Nov 2021-pp 143-154

TL;DR: This paper propose an unsupervised learning approach for end-to-end knowledge-grounded conversation modeling, where the top-k relevant knowledge elements are selected and then employed in knowledge-based response generation.

read less

Abstract: Large-scale conversation models are turning to leveraging external knowledge to improve the factual accuracy in response generation. Considering the infeasibility to annotate the external knowledge for large-scale dialogue corpora, it is desirable to learn the knowledge selection and response generation in an unsupervised manner. In this paper, we propose PLATO-KAG (Knowledge-Augmented Generation), an unsupervised learning approach for end-to-end knowledge-grounded conversation modeling. For each dialogue context, the top-k relevant knowledge elements are selected and then employed in knowledge-grounded response generation. The two components of knowledge selection and response generation are optimized jointly and effectively under a balanced objective. Experimental results on two publicly available datasets validate the superiority of PLATO-KAG.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article•

Adam: A Method for Stochastic Optimization

[...]

Diederik P. Kingma¹, Jimmy Ba²•Institutions (2)

University of Amsterdam¹, University of Toronto²

01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

...read moreread less

111,197 citations

Journal Article•DOI•

Measuring nominal scale agreement among many raters.

[...]

Joseph L. Fleiss¹•Institutions (1)

New York State Department of Mental Hygiene¹

01 Jan 1971-Psychological Bulletin

7,318 citations

Proceedings Article•

Signature Verification using a "Siamese" Time Delay Neural Network

[...]

Jane Bromley¹, Isabelle Guyon¹, Yann LeCun¹, E. Sackinger¹, Roopak Shah¹ - Show less +1 more•Institutions (1)

Bell Labs¹

29 Nov 1993

TL;DR: An algorithm for verification of signatures written on a pen-input tablet based on a novel, artificial neural network called a "Siamese" neural network, which consists of two identical sub-networks joined at their outputs.

...read moreread less

Abstract: This paper describes an algorithm for verification of signatures written on a pen-input tablet. The algorithm is based on a novel, artificial neural network, called a "Siamese" neural network. This network consists of two identical sub-networks joined at their outputs. During training the two sub-networks extract features from two signatures, while the joining neuron measures the distance between the two feature vectors. Verification consists of comparing an extracted feature vector with a stored feature vector for the signer. Signatures closer to this stored representation than a chosen threshold are accepted, all other signatures are rejected as forgeries.

...read moreread less

2,980 citations

Proceedings Article•DOI•

A Diversity-Promoting Objective Function for Neural Conversation Models

[...]

Jiwei Li¹, Michel Galley², Chris Brockett³, Jianfeng Gao³, Bill Dolan³ - Show less +1 more•Institutions (3)

Stanford University¹, Carnegie Mellon University², Microsoft³

01 Mar 2016

TL;DR: The authors proposed using Maximum Mutual Information (MMI) as the objective function in neural models to generate more diverse, interesting, and appropriate responses, yielding substantive gains in BLEU scores on two conversational datasets.

...read moreread less

Abstract: Sequence-to-sequence neural network models for generation of conversational responses tend to generate safe, commonplace responses (e.g., I don’t know) regardless of the input. We suggest that the traditional objective function, i.e., the likelihood of output (response) given input (message) is unsuited to response generation tasks. Instead we propose using Maximum Mutual Information (MMI) as the objective function in neural models. Experimental results demonstrate that the proposed MMI models produce more diverse, interesting, and appropriate responses, yielding substantive gains in BLEU scores on two conversational datasets and in human evaluations.

...read moreread less

1,812 citations

Proceedings Article•DOI•

Personalizing Dialogue Agents: I have a dog, do you have pets too?

[...]

Saizheng Zhang¹, Emily Dinan², Jack Urbanek², Arthur Szlam², Douwe Kiela², Jason Weston³ - Show less +2 more•Institutions (3)

Université de Montréal¹, Facebook², New York University³

22 Jan 2018

TL;DR: In this paper, the task of making chit-chat more engaging by conditioning on profile information is addressed, and the resulting dialogue can be used to predict profile information about the interlocutors.

...read moreread less

Abstract: Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i)condition on their given profile information; and (ii) information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction. Since (ii) is initially unknown our model is trained to engage its partner with personal topics, and we show the resulting dialogue can be used to predict profile information about the interlocutors.

...read moreread less

808 citations