scispace - formally typeset
Open AccessJournal ArticleDOI

Adaptive dialogue management using intent clustering and fuzzy rules

TLDR
A proposal to automatically generate the dialogue rules from a dialogue corpus through the use of evolving algorithms and adapt the rules according to the detected user intention, which is an efficient way for adapting a set of dialogue rules considering user utterance clusters.
Abstract
Conversational systems have become an element of everyday life for billions of users who use speech‐based interfaces to services, engage with personal digital assistants on smartphones, social media chatbots, or smart speakers. One of the most complex tasks in the development of these systems is to design the dialogue model, the logic that provided a user input selects the next answer. The dialogue model must also consider mechanisms to adapt the response of the system and the interaction style according to different groups and user profiles. Rule‐based systems are difficult to adapt to phenomena that were not taken into consideration at design‐time. However, many of the systems that are commercially available are based on rules, and so are the most widespread tools for the development of chatbots and speech interfaces. In this article, we present a proposal to: (a) automatically generate the dialogue rules from a dialogue corpus through the use of evolving algorithms, (b) adapt the rules according to the detected user intention. We have evaluated our proposal with several conversational systems of different application domains, from which our approach provided an efficient way for adapting a set of dialogue rules considering user utterance clusters.

read more

Content maybe subject to copyright    Report

ADAPTIVE DIALOGUE MANAGEMENT USING INTENT
CLUSTERING AND FUZZY RULES
ACCEPTED (PEER-REVIEWED) VERSION
David Griol, Zoraida Callejas
Dept. of Software Engineering, University of Granada
Periodista Daniel Saucedo Aranda s.n., 18071. Granada, Spain
{dgriol, zoraida}@ugr.es
José Manuel Molina, Araceli Sanchis
Dept. of Computer Science,
Universidad Carlos III de Madrid
Avda. de la Universidad, 30, 28911. Leganés, Spain
molina@ia.uc3m.es, masm@inf.uc3m.es
This is the peer reviewed version of the following article: Griol, D, Callejas, Z, Molina, JM, Sanchis, A. Adaptive
dialogue management using intent clustering and fuzzy rules. Expert Systems. 2020; e12630. , which has been
published in final form at
https://doi.org/10.1111/exsy.12630
. This article may be used for non-commercial
purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.

ABSTRACT
Conversational systems have become an element of everyday life for billions of users who use
speech-based interfaces to services, engage with personal digital assistants on smartphones, social
media chatbots, or smart speakers. One of the most complex tasks in the development of these
systems is to design the dialogue model, the logic that provided a user input selects the next answer.
The dialogue model must also consider mechanisms to adapt the response of the system and the
interaction style according to different groups and user profiles. Rule-based systems are difficult
to adapt to phenomena that were not taken into consideration at design-time. However, many of
the systems that are commercially available are based on rules, and so are the most widespread
tools for the development of chatbots and speech interfaces. In this paper, we present a proposal
to: i) automatically generate the dialogue rules from a dialogue corpus through the use of evolving
algorithms, ii) adapt the rules according to the detected user intention. We have evaluated our
proposal with several conversational systems of different application domains, from which our
approach provided an efficient way for adapting a set of dialogue rules considering user utterance
clusters.
Keywords
Conversational Systems
·
Dialogue Management
·
Dialogue Rules, Evolving Classifiers
·
Clustering
·
User
Modeling
1 Introduction
Conversational systems are computer programs that engage users in a dialogue using natural language [
1
,
2
,
3
]. The
increasing maturity of speech and conversational technologies has made possible to integrate conversational interaction
in a range of application domains, smart devices, and environments. Some good examples are personal assistants
in mobile devices and smart speakers, educational tutoring agents, entertainment chatbots in open domains, online
customer services that answer questions about products or services, question answering and triage systems, and robots
that offer spoken communication [4, 5, 1, 6].
Spoken conversational systems usually require a sequence of interactions between the user and the system to gradually
achieve the user’s goals after several dialogue turns. A set of basic actions is repeated after each user utterance to
recognize the sequence of words in the speech signal (Automatic Speech recognition, ASR), understand these words
to obtain their meaning (i.e, extract and classify the information pieces that are useful for the domain of the system)
(Spoken Language Understanding, SLU), consider the context features of the conversation and the results of the
queries to the data repositories of the system to decide the next action/s for the system (Dialogue Management, DM),
and generate a spoken message to provide a response to the user (Natural Language Generation and Text-to-Speech
Synthesis, NLG and TTS) [7, 1, 3].
The Dialogue Manager (DM) is the component responsible for determining the next action to be taken by the system
provided the user input. In order to do so, it must follow a dialogue strategy that can be computed using machine
learning approaches or that may follow a predefined set of rules [1, 8, 9].
The design practices of conventional commercial dialogue systems are well established in industry and usually adopt
a rule-based approach. This way, voice user interface (VUI) experts [
10
,
11
,
12
] handcraft a detailed dialogue plan
based on their knowledge about the specific task and the business rules. In addition, designers commonly define the
precise wording for the system prompts according to the dialogue state and context, and also the expected types of
user’s utterances for each turn.
As described in [
13
,
14
], this approach is well-documented [
15
] and has been used to develop hundreds of successful
commercial dialogue systems. Rule-based approaches are also an efficient alternative when the dialogue system must
be available in an embedded device with important hardware constraints [
16
]. However, the resulting dialogue model
lacks the flexibility to adapt the system according to new observations that were not considered at design time. Thus,
new approaches must be considered in order to develop dialogue models that can be incorporate into state-of-the-art
development environments and at the same time provide enough flexibility.
Most existing dialogue systems collect data to assess and improve their performance, including some form of quality
assessment, time management representation, and business processing aspects. Dialogue systems can employ this
information to provide personalized dialogue management strategies [
17
,
18
,
19
], for instance to adapt the services and
information provided by the conversational system, optimize the overall time required to solve the users’ queries and
favour user engagement and fidelity.
2

While the combination of data with statistical methodologies has been applied to develop and improve different aspects of
spoken dialogue systems (e.g., Hidden Markov models and Gaussian mixture models for Automatic Speech Recognition;
Hidden Vector State model, Stochastic Finite State Transducers, Dynamic Bayesian Networks, Support Vector Machines
and Conditional Random Fields for Spoken Language Understanding; Partially observable Markov Decision Processes
and Bayesian Networks for Dialogue Management; Markov Decision Processes and Reinforcement Learning for Natural
Language Generation) [
1
], they have not been sufficiently exploited to develop dialogue management models that can
be easily adapted to different users, extension or the current application domain or new application domains. It is also
required to have the explainability of the solution reached and the flexibility of transforming these statistical models into
a set of rules to implement these models using already existing commercial infrastructures, such as Google DialogFlow,
IBM Watson or Amazon Lex. These rules must be also designed to operate under partially observable settings.
This paper presents three main contributions:
An approach to obtain a dialogue model for a specific task by means of a classification process using fuzzy-
rule-based evolving classifiers. This allows obtaining a set of fuzzy rules that can be directly employed to
develop a rule-based dialogue manager, making it possible to obtain new generation interfaces without the
need for changing the already existing commercial infrastructures.
An approach based on clustering to identify user intents, and how it can be incorporated to adapt the system
rules to new observed user inputs by associating them to one of the clusters computed.
To evaluate our proposal with conversational systems developed for domains with very different nature and
complexity, different definitions for the semantics of the task, interaction languages, dialogue initiatives,
confirmation and error handling and correction techniques, interaction, underlying technologies, and different
techniques for acquiring the initial dialogue corpus (human-human, human-machine, Wizard of Oz technique).
The remainder of the paper is organized as follows. In Section 2 we describe the motivation of our proposal and review
the related work in the areas of dialogue management and adaptation methodologies for conversational interfaces.
Section 3 describes our proposal for automatically defining rule-based dialogue models and adapting them according
to the user input clustering. Section 4.1 describes the dialogue systems used for evaluating our proposal. Section 5
presents the experimental set-up, the measures defined to evaluate the dialogue models and a discussion of the results
obtained. Finally, Section 6 presents the conclusions and some guidelines for future work.
2 State of the art: user-adapted conversational interfaces
User adaptation in conversational systems can be achieved by means of the combination of techniques to capture and
represent the information used to characterize and classify users, and flexible dialogue management strategies to adapt
the dialogue model according to these information sources [1, 3].
2.1 Dialogue management methodologies
As described in the previous section, the design of the dialogue model is one of the most important tasks of a
conversational system given that the selection of a specific system action depends on multiple factors (e.g., the outputs
and confidence measures provided by the ASR and SLU modules; the channels and devices used for the interaction;
the results of the queries to the data repositories; restrictions defined for the specific domain of the system; interaction
initiatives directed by the user, system or both; confirmation strategies based on explicit or implicit confirmations, etc.).
Due to these factors, the design of the dialogue model is at the core of conversational interface engineering and is largely
responsible for user satisfaction. In addition, to consider these factors to decide the next system response, the dialogue
manager needs to track the dialogue history and update its representation of the current state of the dialogue [1, 8].
Conversational systems designed for highly structured tasks and system-directed initiatives usually employ finite
state-based dialogue models [
20
], for which the users’ utterances are restricted to simple cases within the scope of the
ASR and SLU modules. This approach is generally implemented using finite state automata with handcrafted rules.
Frame-based dialogue managers [
21
] use a slot-based frame structure to collect each information piece provided by the
user. Users can fill more than one slot per dialogue turn following any order. Plan-based approaches [
22
] claim that
the speaker’s speech act is part of a plan and that it is the listener’s task to identify and respond appropriately to this
plan. Agent-based approaches can be employed when it is necessary to execute and monitor operations in application
domains that change dynamically [23].
In all these approaches, application developers, together with voice user interface designers, typically handcraft DM
strategies using rules and heuristics. However, there exist approaches to introduce variability in these systems. For
3

example, the emotional virtual agent PRIMER uses a rule-based dialogue manager [
24
] that is adaptive to the user as it
does not only consider the current dialogue state, but also the emotional state of the conversation and user progress.
However, in certain scenarios it may be difficult and costly to foresee which form of system behavior will lead to a quick
and successful completion of the dialogue. This has motivated the research community to find ways for automating
dialogue learning using statistical models trained with real conversations [
18
,
25
,
26
,
27
]. These models allow to
explore a wider range of strategies to model the variability of user behaviours.
Statistical approaches to DM can be classified into three main categories: dialogue modeling based on reinforcement
learning (RL), corpus-based statistical dialogue management, and example-based dialogue management. Example-based
approaches can be considered a specific case of corpus-based statistical dialogue management, given that they predict
the next system action when the dialogue manager finds dialogue examples that have a similar dialogue state to the
current one [20].
An extended methodology for learning dialogue strategies models human-computer interaction as an optimization
problem using Partially Observable Markov Decision Process (POMDPs) and reinforcement methods [
28
,
25
]. This
approach uses multiple hypotheses of the current dialogue state to explicitly represent uncertainty. However, it is limited
to small-scale problems, since they require many hand-crafted features for the state and action space representation
[
29
]. Other interesting approaches for statistical DM are based on modeling the system by means of Hidden Markov
models (HMMs) [
30
], stochastic finite-state transducers [
26
], Bayesian networks [
31
], or recurrent neural networks
[32] and Reinforcement Deep Learning [33].
In this paper, we propose a hybrid approach to dialogue management that seeks to combine the benefits of rule-based
and statistical techniques in a single framework. To do this, we propose a statistical technique to automatically
extract the set of rules for dialogue management from a labeled dialogue corpus. Several authors have followed a
similar approximation, for example, using discriminative classification models to learn information state updates [
34
],
probabilistic rules whose values are estimated from dialogue data using Bayesian inference [
8
], or procedural dialogue
management methodologies based on defining dialogue trees [35].
Our proposal differs from the aforementioned approaches in several main aspects. Our proposal models the dialogue by
means of a classification process based on fuzzy-rule-based evolving classifiers, which considers the complete history
of the dialogue as input. Fuzzy logic is an efficient means to reason over the data and provide a varied system behaviour
while maintain the explainability of the solution reached. Recently, [
36
] employed fuzzy rules to build a recommender
system that follows a conversational logic in which the user is prompted about their preferences at each step. In their
case, the fuzzy logic is aimed at optimizing the recommendation, while for us it is way of obtaining a set of rules
that can be directly employed to develop a rule-based dialogue manager, making possible to obtain new generation
interfaces without the need for changing the already existing commercial infrastructures.
2.2 Identifying user intents
As explained in the introduction section, the rules generated for the dialogue manager must be adaptable to different
users. A recent review of the literature of the last decade about conversational agents in business [
6
], has highlighted as
one the challenges for the future the definition and management of user profiles, the context of interaction, and the
detection methods of user’s intention in order to provide personalized responses.
Indeed, to provide a positive user experience, dialogue systems should ideally adapt to the behavior of individual
users or particular user groups. User profiling and behaviour adaptation is also a hot topic in human-robot interaction,
specially for the detection of social cues [37].
In order to be user aware, dialogue systems must identify user profiles and be able to tailor the interaction the them.
In [
38
], the authors present three basic forms of showing user awareness in conversational agents: being aware of
the previous interactions, re-prompt users when there is no input, reword and re-prompt messages with more detailed
information based on previous interaction. These guidelines are valid for all system users and at the same type consider
specific information about each individual. Another approach is to group users that behave similarly and generate
adaptation categories.
Research in techniques for user modeling has a long history within the fields of conversational systems [
9
]. Most of the
proposed user models are used to generate dialogue corpora by means of the interaction between the conversational
system and the user model. Different types of statistical techniques have been proposed to achieve this goal: n-gram
models[
39
], graph-based models [
40
], Hidden Markov Models [
41
], logistic regression [
42
], EM-based algorithms [
43
].
This kind of models have been used to automatically evaluate conversational systems (particularly at the earlier stages
of development), determine the effects of changes to the system’s functionalities or evaluate the capacity of the dialogue
4

manager to react to unexpected situations. However, these models are not used to define user profiles and adapt the
dialogue manager.
A recent survey of intent detection methods for dialogue systems is presented in [
7
]. This is one of the main tasks for
the NLU module in order to detect the dialogues acts in the user utterance and provide this information to the dialogue
manager. A range of statistical techniques have been proposed to complete this task, from traditional techniques, such
as Naive Bayes [
44
] or Support Vector Machines [
45
], to mainstream methods, such as Convolutional Neural Networks
[
46
], word embeddings [
47
] Recurrent Neural Networks [
48
], Long Short-Term Memory Networks [
49
] or Gated
Recurrent Units [50].
Clustering techniques have been proposed to avoid the time-consuming task required to manually analyze the set of user
questions and create a taxonomy of intents to be assigned to the appropriate system responses [
51
]. These techniques
have been employed to mine dialogue corpora with different purposes. For example, [
52
] used clustering to group
different types of comments provided by annotators and use them to anticipate dialogue breakdown. [
53
] propose a
clustering technique to extend the standard K-means algorithm to simultaneously cluster user and system utterances
considering their adjacency information.
Clustering relies on distance measures between the user inputs being classified. Measuring semantic text similarity
has been a key topic in Artificial Intelligence, Natural Language Processing, and information retrieval for many years
[
54
]. Different techniques has been proposed for this task [
55
]. A set of techniques uses bag of words to represent
documents using vectors and calculate distances in the vector space model using the cosine similarity. The second
group of techniques are based on the assumption that two text sequences are semantically equivalent if their words
or expressions can be aligned. The third group of approaches use machine learning approaches combining different
measures, features and techniques: graph-based approaches [
56
], word embeddings [
57
], convolutional neural networks
[58], dynamic finite state machines [59], etc.
[60] showed the feasibility of this approach with application domains with a varied number of slots and also with real
and simulated users. [
61
] describes a modification of the K-Means algorithm adapted to question-answering websites
and forums. [
62
] presents a proposal to use clustering to create directed graphs with the main transitions between topics
among dialogues.
These techniques are also proposed in [
63
,
64
] to identify the topics and sequences of topics during the dialogue.
Different methods are then proposed to use this information to manage the dialogue (i.e., model the transitions between
topics: Hidden Markov Models Paul12, Latent Dirichlet Allocation [
65
], Deep Learning techniques [
66
], POMDPs
[
67
] or ad-hoc methods based on the definition of heuristics [
68
]. We have used clustering in previous work to generate
user profiles that can be of interest to assess the quality of the system and/or predict user satisfaction [69].
The method that we propose to identify the user intents is based on measuring the semantic textual similarity between
the user sentences available in a data corpus for the target domain. This information is then used with the Density-Based
Spatial Clustering of Applications with Noise (DBSCAN) algorithm [
70
] to obtain an initial representation of the
domain based on the distance matrix computed. The set of clusters obtained represents the users’ sentences that can be
assigned to the same system intent. After the system receives a new user input, it is assigned to one of the clusters and
this information is taken into account in the conditions of the fuzzy dialogue rules for a fine grained decision of the next
system response, setting the basis for an integrated adaptive dialogue manager development procedure that can be easily
interpreted, adapted and implemented with commercial platforms.
3 Our proposal for defining user’s adaptive dialogue models
Figure 1 shows the architecture of the proposed framework to design rule-based dialogue managers. This figure shows
the training and run time operations of the proposal. In the training stage, the user utterances that are available in the
training corpus are used to calculate the text similarities among them.
For the corpus in English we have computed the semantic textual similarity (STS) between each pair of user phrases in
the corpus using the API of the University of Maryland [
54
]. This API uses an hybrid word similarity model based on
Latent Semantic Analysis (LSA) Word Similarity hypothesis (words occurring in the same contexts tend to have similar
meanings) based on a corpus with three billion words. The word co-occurrence models are combined with WordNet to
boost LSA similarity and deal with words of many senses.
The previous measure is not available for Spanish, so for languages different to English we have used cosine similarity,
a metric that is very widespread to compare documents. It measures the angle between two vectors projected in a
multidimensional space. In our case, two arrays containing the word counts, where each dimension corresponds to a
word in the user input. To perform the calculations we have implemented a script in Python using the library scikit to
5

Citations
More filters

Dialogue Systems for Intelligent Human Computer Interactions.

TL;DR: An overview on existing methods for dialogue manager training; their advantages and limitations are presented and a new image-based method is used in Facebook bAbI Task 1 dataset in Out Of Vocabulary setting.
Proceedings ArticleDOI

Towards conversational technology to promote, monitor and protect mental health.

TL;DR: A general overview of the H2020MSCA-RISE project MENHIR (Mental health monitoring through interactive conversations), which aim is to explore the possibilities of conversational technologies (chatbots) to understand, promote and protect mental health and assist people with anxiety and mild depression manage their conditions.
References
More filters
Proceedings Article

A comparison of event models for naive bayes text classification

TL;DR: It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size.
Posted Content

A Neural Conversational Model

TL;DR: A simple approach to conversational modeling which uses the recently proposed sequence to sequence framework, and is able to extract knowledge from both a domain specific dataset, and from a large, noisy, and general domain dataset of movie subtitles.
Journal ArticleDOI

Partially observable Markov decision processes for spoken dialog systems

TL;DR: This paper cast a spoken dialog system as a partially observable Markov decision process (POMDP) and shows how this formulation unifies and extends existing techniques to form a single principled framework.
Journal ArticleDOI

An approach to online identification of Takagi-Sugeno fuzzy models

TL;DR: An approach to the online learning of Takagi-Sugeno (TS) type models is proposed, based on a novel learning algorithm that recursively updates TS model structure and parameters by combining supervised and unsupervised learning.
Journal ArticleDOI

POMDP-Based Statistical Spoken Dialog Systems: A Review

TL;DR: This review article provides an overview of the current state of the art in the development of POMDP-based spoken dialog systems.
Related Papers (5)
Frequently Asked Questions (20)
Q1. What are the contributions in this paper?

One of the most complex tasks in the development of these systems is to design the dialogue model, the logic that provided a user input selects the next answer. In this paper, the authors present a proposal to: i ) automatically generate the dialogue rules from a dialogue corpus through the use of evolving algorithms, ii ) adapt the rules according to the detected user intention. The authors have evaluated their proposal with several conversational systems of different application domains, from which their approach provided an efficient way for adapting a set of dialogue rules considering user utterance clusters. 

For future work the authors plan to apply the proposed technique to multi-domain tasks in order to measure the capability of their methodology to adapt efficiently to contexts that can vary dynamically. Finally, the authors also intend to extend the evaluation of the system considering additional measures related to user ’ s profiles that complement the proposed adaptation. 

The corpus consists of 6,280 user turns and 9,133 system turns, with an average number of 7.6 words per turn and the vocabulary has a size of 811 words. 

A total of 51 system responses were defined for the task (classified into confirmations of concepts and attributes, questions to require data from the user, and answers obtained after a query to the database). 

A total of seven clusters have been defined for the Let’s Go task, seven for the Dihana Task, and eight clusters for the Edecan task. 

To perform the calculations the authors have implemented a script in Python using the library scikit tovectorize the phrases, pandas and scipy to generate data frames and the squared distance matrix, and klearn for the computation of cosine distance. 

The increasing maturity of speech and conversational technologies has made possible to integrate conversational interaction in a range of application domains, smart devices, and environments. 

A corpus of 900 dialogues (10.8 hours) was acquired for the task by means of the Wizard of Oz (WoZ) technique with 225 real users, for which an initial dialogue strategy was defined by experts. 

In [38], the authors present three basic forms of showing user awareness in conversational agents: being aware of the previous interactions, re-prompt users when there is no input, reword and re-prompt messages with more detailed information based on previous interaction. 

For future work the authors plan to apply the proposed technique to multi-domain tasks in order to measure the capability of their methodology to adapt efficiently to contexts that can vary dynamically. 

A fuzzy rule in the eClass0 model has the following structure:Rulei = IF (Feature1 is P1) AND . . . . . . AND (Featuren is Pn) AND (Cluster is Cm) THEN Class = ciwhere i represents the number of the rule; n is the number of input features (number of entities corresponding to the different slots defined for the DR); the vector Feature stores the observed features, whereas vector P stores the values of the features of one of the prototypes (coded in terms of the tree possible values, {0, 1, 2}). 

The third group of approaches use machine learning approaches combining different measures, features and techniques: graph-based approaches [56], word embeddings [57], convolutional neural networks [58], dynamic finite state machines [59], etc.[60] showed the feasibility of this approach with application domains with a varied number of slots and also with real and simulated users. [61] describes a modification of the K-Means algorithm adapted to question-answering websites and forums. [62] presents a proposal to use clustering to create directed graphs with the main transitions between topics among dialogues. 

With regard to the previous studies that have used the Edecan corpus, a 78.80% percentage of successful dialogues were obtained for the statistical dialogue model and user simulation techniques proposed in [18], a percentage of 73.40% successful dialogues was obtained in a stochastic finite-state transducers model proposed in [74]. 

With regard to the previous studies that have used the Dihana corpus, a 72.60% percentage of successful dialogues were obtained for the statistical dialogue model and user simulation techniques proposed in [79], a percentage of 82.90% successful dialogues was obtained in a HMM-based model proposed in [80], a percentage of 83.64% successful dialogues was obtained with the statistical methodology proposed in [27]. 

Different types of statistical techniques have been proposed to achieve this goal: n-gram models[39], graph-based models [40], Hidden Markov Models [41], logistic regression [42], EM-based algorithms [43]. 

the codification developed to represent the state of the dialogue and the good operation of the MLP classifiers make it possible for the number of responses that cause the failure of the system to be only 4.23%, 5.25% and 5.11% for the DM2 dialogue models, reducing the percentages obtained with the DM1 dialogue features. 

Dialogue systems can employ this information to provide personalized dialogue management strategies [17, 18, 19], for instance to adapt the services and information provided by the conversational system, optimize the overall time required to solve the users’ queries and favour user engagement and fidelity. 

The results, showed in Table 4, indicate that using the DM2 dialogue models there was an increment in the number of system turns that actually provide information to the user, which is consistent with the fact that the task completion rate is higher using their proposed dialogue model. 

It is also required to have the explainability of the solution reached and the flexibility of transforming these statistical models into a set of rules to implement these models using already existing commercial infrastructures, such as Google DialogFlow, IBM Watson or Amazon Lex. 

With regard a version of the system developed by means of the DUDE development [77], the 62% of calls reached the stage of presenting results to the user.