DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset

Open AccessProceedings Article

DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset

Yanran Li, +5 more

- Vol. 1, pp 986-995

Chats0

TLDR

This paper developed a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects, such as human-written and less noisy language, the dialogues in the dataset reflect our daily communication way and cover various topics about our daily life.

Abstract:

We develop a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects. The language is human-written and less noisy. The dialogues in the dataset reflect our daily communication way and cover various topics about our daily life. We also manually label the developed dataset with communication intention and emotion information. Then, we evaluate existing approaches on DailyDialog dataset and hope it benefit the research field of dialog systems. The dataset is available on http://yanran.li/dailydialog

Citations

PDF

Open Access

More filters

Posted Content

Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset

Hannah Rashkin, +3 more

- 01 Nov 2018 -

arXiv: Computation and Language

TL;DR: This work proposes a new benchmark for empathetic dialogue generation and EmpatheticDialogues, a novel dataset of 25k conversations grounded in emotional situations, and presents empirical comparisons of dialogue model adaptations forEmpathetic responding, leveraging existing models or datasets without requiring lengthy re-training of the full model.

...read moreread less

Proceedings ArticleDOI

GoEmotions: A Dataset of Fine-Grained Emotions

Dorottya Demszky, +5 more

TL;DR: GoEmotions, the largest manually annotated dataset of 58k English Reddit comments, labeled for 27 emotion categories or Neutral is introduced, and the high quality of the annotations via Principal Preserved Component Analysis is demonstrated.

...read moreread less

Proceedings ArticleDOI

MojiTalk: Generating Emotional Responses at Scale

Xianda Zhou, +1 more

TL;DR: This paper collects a large corpus of Twitter conversations that include emojis in the response and investigates several conditional variational autoencoders training on these conversations, which allow us to use emojes to control the emotion of the generated text.

...read moreread less

Proceedings ArticleDOI

Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset

Hannah Rashkin, +3 more

TL;DR: This article proposed a new benchmark for empathetic dialogue generation and EmpatheticDialogues, a novel dataset of 25k conversations grounded in emotional situations, and experiments indicate that dialogue models that use their dataset are perceived to be more empathetically by human evaluators, compared to models merely trained on large-scale Internet conversation data.

...read moreread less

Proceedings Article

An Analysis of Annotated Corpora for Emotion Classification in Text

Laura Ana Maria Bostan, +1 more

TL;DR: A survey of the datasets is carried out, and a subset of corpora is better classified with models trained on a different corpus, which simplifies the choice of the most appropriate resources for developing a model for a novel domain.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

Iulian Vlad Serban, +6 more

TL;DR: The authors proposed a neural network-based generative architecture with stochastic latent variables that span a variable number of time steps to generate meaningful, long and diverse responses and maintain dialogue state.

...read moreread less

News from OPUS — A collection of multilingual parallel corpora with tools and interfaces

Jörg Tiedemann

TL;DR: This article introduces resources that have recently been added to opus and discusses the alignment of movie subtitles and the conversion of biomedical documents and localization data to a sentence aligned xml format.

...read moreread less

Proceedings Article

Learning End-to-End Goal-Oriented Dialog

Antoine Bordes, +2 more

TL;DR: In this article, an end-to-end dialog system based on memory networks is proposed for goal-oriented reservation systems, which can reach promising, yet imperfect, performance and learn to perform non-trivial operations.

...read moreread less

Proceedings ArticleDOI

A Robust System for Natural Spoken Dialogue

James F. Allen, +3 more

TL;DR: An evaluation of the system using time-to-completion and the quality of the final solution suggests that most native speakers of English can use the system successfully with virtually no training.

...read moreread less

Proceedings Article

A Dataset for Research on Short-Text Conversations

Hao Wang, +3 more

TL;DR: This paper introduces a dataset of short-text conversation based on the real-world instances from Sina Weibo, which provides rich collection of instances for the research on finding natural and relevant short responses to a given short text, and useful for both training and testing of conversation models.

...read moreread less

Collapse

DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset

Citations

Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset

GoEmotions: A Dataset of Fine-Grained Emotions

MojiTalk: Generating Emotional Responses at Scale

Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset

An Analysis of Annotated Corpora for Emotion Classification in Text

References

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

News from OPUS — A collection of multilingual parallel corpora with tools and interfaces

Learning End-to-End Goal-Oriented Dialog

A Robust System for Natural Spoken Dialogue

A Dataset for Research on Short-Text Conversations

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Bleu: a Method for Automatic Evaluation of Machine Translation

Attention is All you Need

Adam: A Method for Stochastic Optimization

Sequence to Sequence Learning with Neural Networks