Home
/
Authors
/
Moya Chen

Author

Moya Chen

Bio: Moya Chen is an academic researcher from Facebook. The author has contributed to research in topics: Conversation & Context (language use). The author has an hindex of 2, co-authored 3 publications receiving 25 citations.

Topics: Conversation, Context (language use), Language model, Bootstrapping ...read more

Papers

PDF

Open Access

More filters

Posted Content•

Retrieval Augmentation Reduces Hallucination in Conversation

[...]

Kurt Shuster¹, Spencer Poff¹, Moya Chen¹, Douwe Kiela¹, Jason Weston¹ - Show less +1 more•Institutions (1)

Facebook¹

15 Apr 2021-arXiv: Computation and Language

TL;DR: This paper explore the use of neural retrieval-in-the-loop architectures for knowledge-grounded dialogue, a task that is arguably more challenging as it requires querying based on complex multi-turn dialogue context and generating conversationally coherent responses.

...read moreread less

Abstract: Despite showing increasingly human-like conversational abilities, state-of-the-art dialogue models often suffer from factual incorrectness and hallucination of knowledge (Roller et al., 2020). In this work we explore the use of neural-retrieval-in-the-loop architectures - recently shown to be effective in open-domain QA (Lewis et al., 2020b; Izacard and Grave, 2020) - for knowledge-grounded dialogue, a task that is arguably more challenging as it requires querying based on complex multi-turn dialogue context and generating conversationally coherent responses. We study various types of architectures with multiple components - retrievers, rankers, and encoder-decoders - with the goal of maximizing knowledgeability while retaining conversational ability. We demonstrate that our best models obtain state-of-the-art performance on two knowledge-grounded conversational tasks. The models exhibit open-domain conversational capabilities, generalize effectively to scenarios not within the training data, and, as verified by human evaluations, substantially reduce the well-known problem of knowledge hallucination in state-of-the-art chatbots.

...read moreread less

35 citations

Proceedings Article•

Retrieval Augmentation Reduces Hallucination in Conversation

[...]

Kurt Shuster¹, Spencer Poff¹, Moya Chen¹, Douwe Kiela¹, Jason Weston¹ - Show less +1 more•Institutions (1)

Facebook¹

15 Apr 2021

...read moreread less

5 citations

Posted Content•

Teaching Models new APIs: Domain-Agnostic Simulators for Task Oriented Dialogue.

[...]

Moya Chen¹, Paul A. Crook¹, Stephen Roller¹•Institutions (1)

Facebook¹

13 Oct 2021-arXiv: Computation and Language

TL;DR: The authors demonstrate that large language models are able to simulate Task Oriented Dialogues in novel domains, provided only with an API implementation and a list of goals, and they show these simulations can formulate online, automatic metrics that correlate well with human evaluations.

...read moreread less

Abstract: We demonstrate that large language models are able to simulate Task Oriented Dialogues in novel domains, provided only with an API implementation and a list of goals. We show these simulations can formulate online, automatic metrics that correlate well with human evaluations. Furthermore, by checking for whether the User's goals are met, we can use simulation to repeatedly generate training data and improve the quality of simulations themselves. With no human intervention or domain-specific training data, our simulations bootstrap end-to-end models which achieve a 37\% error reduction in previously unseen domains. By including as few as 32 domain-specific conversations, bootstrapped models can match the performance of a fully-supervised model with $10\times$ more data. To our knowledge, this is the first time simulations have been shown to be effective at bootstrapping models without explicitly requiring any domain-specific training data, rule-engineering, or humans-in-the-loop.

...read moreread less

Cited by

PDF

Open Access

More filters

Posted Content•

Internet-Augmented Dialogue Generation.

[...]

Mojtaba Komeili, Kurt Shuster, Jason Weston

15 Jul 2021-arXiv: Artificial Intelligence

TL;DR: This article proposed an approach that learns to generate an internet search query based on the context, and then conditions on the search results to finally generate a response, a method that can employ up-to-the-minute relevant information.

...read moreread less

Abstract: The largest store of continually updating knowledge on our planet can be accessed via internet search. In this work we study giving access to this information to conversational agents. Large language models, even though they store an impressive amount of knowledge within their weights, are known to hallucinate facts when generating dialogue (Shuster et al., 2021); moreover, those facts are frozen in time at the point of model training. In contrast, we propose an approach that learns to generate an internet search query based on the context, and then conditions on the search results to finally generate a response, a method that can employ up-to-the-minute relevant information. We train and evaluate such models on a newly collected dataset of human-human conversations whereby one of the speakers is given access to internet search during knowledgedriven discussions in order to ground their responses. We find that search-query based access of the internet in conversation provides superior performance compared to existing approaches that either use no augmentation or FAISS-based retrieval (Lewis et al., 2020).

...read moreread less

18 citations

Journal Article•DOI•

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

[...]

Nouha Dziri, Ehsan Kamalloo, Sivan Milton, O. Zaiane, Mo Yu, Edoardo Maria Ponti, Siva Koti Reddy - Show less +3 more

22 Apr 2022-Transactions of the Association for Computational Linguistics

TL;DR: This work creates F AITH D IAL, a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (W O W) benchmark, and benchmark a series of state-of-the-art models and proposes an auxiliary contrastive objective that achieves the highest level of faithfulness and abstractiveness.

...read moreread less

Abstract: Abstract The goal of information-seeking dialogue is to respond to seeker queries with natural language utterances that are grounded on knowledge sources. However, dialogue systems often produce unsupported utterances, a phenomenon known as hallucination. To mitigate this behavior, we adopt a data-centric solution and create FaithDial, a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (WoW) benchmark. We observe that FaithDial is more faithful than WoW while also maintaining engaging conversations. We show that FaithDial can serve as training signal for: i) a hallucination critic, which discriminates whether an utterance is faithful or not, and boosts the performance by 12.8 F1 score on the BEGIN benchmark compared to existing datasets for dialogue coherence; ii) high-quality dialogue generation. We benchmark a series of state-of-the-art models and propose an auxiliary contrastive objective that achieves the highest level of faithfulness and abstractiveness based on several automated metrics. Further, we find that the benefits of FaithDial generalize to zero-shot transfer on other datasets, such as CMU-Dog and TopicalChat. Finally, human evaluation reveals that responses generated by models trained on FaithDial are perceived as more interpretable, cooperative, and engaging.

...read moreread less

17 citations

Posted Content•

Beyond Goldfish Memory: Long-Term Open-Domain Conversation.

[...]

Jing Xu, Arthur Szlam, Jason Weston

15 Jul 2021-arXiv: Computation and Language

TL;DR: In this article, the authors collected and released a human-human dataset consisting of multiple chat sessions whereby the speaking partners learn about each other's interests and discuss the things they have learnt from past sessions, and they show how existing models trained on existing datasets perform poorly in this long-term conversation setting in both automatic and human evaluations.

...read moreread less

Abstract: Despite recent improvements in open-domain dialogue models, state of the art models are trained and evaluated on short conversations with little context. In contrast, the long-term conversation setting has hardly been studied. In this work we collect and release a human-human dataset consisting of multiple chat sessions whereby the speaking partners learn about each other's interests and discuss the things they have learnt from past sessions. We show how existing models trained on existing datasets perform poorly in this long-term conversation setting in both automatic and human evaluations, and we study long-context models that can perform much better. In particular, we find retrieval-augmented methods and methods with an ability to summarize and recall previous conversations outperform the standard encoder-decoder architectures currently considered state of the art.

...read moreread less

16 citations

Posted Content•

TruthfulQA: Measuring How Models Mimic Human Falsehoods

[...]

Stephanie Lin, Jacob Hilton¹, Owain Evans•Institutions (1)

OpenAI¹

08 Sep 2021-arXiv: Computation and Language

TL;DR: This paper proposed a benchmark to measure whether a language model is truthful in generating answers to questions, which consists of 817 questions that span 38 categories, including health, law, finance and politics.

...read moreread less

Abstract: We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. We crafted questions that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from imitating human texts. We tested GPT-3, GPT-Neo/J, GPT-2 and a T5-based model. The best model was truthful on 58% of questions, while human performance was 94%. Models generated many false answers that mimic popular misconceptions and have the potential to deceive humans. The largest models were generally the least truthful. For example, the 6B-parameter GPT-J model was 17% less truthful than its 125M-parameter counterpart. This contrasts with other NLP tasks, where performance improves with model size. However, this result is expected if false answers are learned from the training distribution. We suggest that scaling up models alone is less promising for improving truthfulness than fine-tuning using training objectives other than imitation of text from the web.

...read moreread less

14 citations

Proceedings Article•DOI•

Internet-Augmented Dialogue Generation

[...]

01 Jan 2022

TL;DR: The authors proposed an approach that learns to generate an internet search query based on the context, and then conditions on the search results to finally generate a response, a method that can employ up-to-the-minute relevant information.

...read moreread less

9 citations

1
2
3
4
…
5
6
7
8

Collapse