scispace - formally typeset
Open AccessProceedings Article

DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset

Reads0
Chats0
TLDR
This paper developed a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects, such as human-written and less noisy language, the dialogues in the dataset reflect our daily communication way and cover various topics about our daily life.
Abstract
We develop a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects. The language is human-written and less noisy. The dialogues in the dataset reflect our daily communication way and cover various topics about our daily life. We also manually label the developed dataset with communication intention and emotion information. Then, we evaluate existing approaches on DailyDialog dataset and hope it benefit the research field of dialog systems. The dataset is available on http://yanran.li/dailydialog

read more

Citations
More filters
Posted Content

Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset

TL;DR: This work proposes a new benchmark for empathetic dialogue generation and EmpatheticDialogues, a novel dataset of 25k conversations grounded in emotional situations, and presents empirical comparisons of dialogue model adaptations forEmpathetic responding, leveraging existing models or datasets without requiring lengthy re-training of the full model.
Proceedings ArticleDOI

GoEmotions: A Dataset of Fine-Grained Emotions

TL;DR: GoEmotions, the largest manually annotated dataset of 58k English Reddit comments, labeled for 27 emotion categories or Neutral is introduced, and the high quality of the annotations via Principal Preserved Component Analysis is demonstrated.
Proceedings ArticleDOI

MojiTalk: Generating Emotional Responses at Scale

TL;DR: This paper collects a large corpus of Twitter conversations that include emojis in the response and investigates several conditional variational autoencoders training on these conversations, which allow us to use emojes to control the emotion of the generated text.
Proceedings ArticleDOI

Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset

TL;DR: This article proposed a new benchmark for empathetic dialogue generation and EmpatheticDialogues, a novel dataset of 25k conversations grounded in emotional situations, and experiments indicate that dialogue models that use their dataset are perceived to be more empathetically by human evaluators, compared to models merely trained on large-scale Internet conversation data.
Proceedings Article

An Analysis of Annotated Corpora for Emotion Classification in Text

TL;DR: A survey of the datasets is carried out, and a subset of corpora is better classified with models trained on a different corpus, which simplifies the choice of the most appropriate resources for developing a model for a novel domain.
References
More filters
Proceedings Article

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

TL;DR: The authors proposed a neural network-based generative architecture with stochastic latent variables that span a variable number of time steps to generate meaningful, long and diverse responses and maintain dialogue state.

News from OPUS — A collection of multilingual parallel corpora with tools and interfaces

TL;DR: This article introduces resources that have recently been added to opus and discusses the alignment of movie subtitles and the conversion of biomedical documents and localization data to a sentence aligned xml format.
Proceedings Article

Learning End-to-End Goal-Oriented Dialog

TL;DR: In this article, an end-to-end dialog system based on memory networks is proposed for goal-oriented reservation systems, which can reach promising, yet imperfect, performance and learn to perform non-trivial operations.
Proceedings ArticleDOI

A Robust System for Natural Spoken Dialogue

TL;DR: An evaluation of the system using time-to-completion and the quality of the final solution suggests that most native speakers of English can use the system successfully with virtually no training.
Proceedings Article

A Dataset for Research on Short-Text Conversations

TL;DR: This paper introduces a dataset of short-text conversation based on the real-world instances from Sina Weibo, which provides rich collection of instances for the research on finding natural and relevant short responses to a given short text, and useful for both training and testing of conversation models.