What are the future works in this paper?

For future work the authors plan to apply the proposed technique to multi-domain tasks in order to measure the capability of their methodology to adapt efficiently to contexts that can vary dynamically. Finally, the authors also intend to extend the evaluation of the system considering additional measures related to user ’ s profiles that complement the proposed adaptation.

How many user turns did the corpus have?

The corpus consists of 6,280 user turns and 9,133 system turns, with an average number of 7.6 words per turn and the vocabulary has a size of 811 words.

How many system responses were defined for the Let’s Go task?

A total of 51 system responses were defined for the task (classified into confirmations of concepts and attributes, questions to require data from the user, and answers obtained after a query to the database).

How many clusters have been defined for the Let’s Go task?

A total of seven clusters have been defined for the Let’s Go task, seven for the Dihana Task, and eight clusters for the Edecan task.

What is the syntax used to perform the calculations?

To perform the calculations the authors have implemented a script in Python using the library scikit tovectorize the phrases, pandas and scipy to generate data frames and the squared distance matrix, and klearn for the computation of cosine distance.

How many user dialogues were acquired for the Let’s Go task?

A corpus of 900 dialogues (10.8 hours) was acquired for the task by means of the Wizard of Oz (WoZ) technique with 225 real users, for which an initial dialogue strategy was defined by experts.

What is the purpose of the proposed technique?

For future work the authors plan to apply the proposed technique to multi-domain tasks in order to measure the capability of their methodology to adapt efficiently to contexts that can vary dynamically.

What is the structure of the fuzzy rule in the eClass0 model?

A fuzzy rule in the eClass0 model has the following structure:Rulei = IF (Feature1 is P1) AND . . . . . . AND (Featuren is Pn) AND (Cluster is Cm) THEN Class = ciwhere i represents the number of the rule; n is the number of input features (number of entities corresponding to the different slots defined for the DR); the vector Feature stores the observed features, whereas vector P stores the values of the features of one of the prototypes (coded in terms of the tree possible values, {0, 1, 2}).

How many successful dialogues were obtained in the Edecan corpus?

With regard to the previous studies that have used the Edecan corpus, a 78.80% percentage of successful dialogues were obtained for the statistical dialogue model and user simulation techniques proposed in [18], a percentage of 73.40% successful dialogues was obtained in a stochastic finite-state transducers model proposed in [74].

How many successful dialogues were obtained in the Dihana task?

With regard to the previous studies that have used the Dihana corpus, a 72.60% percentage of successful dialogues were obtained for the statistical dialogue model and user simulation techniques proposed in [79], a percentage of 82.90% successful dialogues was obtained in a HMM-based model proposed in [80], a percentage of 83.64% successful dialogues was obtained with the statistical methodology proposed in [27].

What is the percentage of successful dialogues obtained in the Dihana task?

the codification developed to represent the state of the dialogue and the good operation of the MLP classifiers make it possible for the number of responses that cause the failure of the system to be only 4.23%, 5.25% and 5.11% for the DM2 dialogue models, reducing the percentages obtained with the DM1 dialogue features.

What is the percentage of successful dialogues obtained in the DM2 model?

The results, showed in Table 4, indicate that using the DM2 dialogue models there was an increment in the number of system turns that actually provide information to the user, which is consistent with the fact that the task completion rate is higher using their proposed dialogue model.

How many calls reached the stage of presenting results to the user?

With regard a version of the system developed by means of the DUDE development [77], the 62% of calls reached the stage of presenting results to the user.

(Open Access) Adaptive dialogue management using intent clustering and fuzzy rules (2021) | David Griol

Q: What are the three basic forms of showing user awareness in conversational agents?

In [38], the authors present three basic forms of showing user awareness in conversational agents: being aware of the previous interactions, re-prompt users when there is no input, reword and re-prompt messages with more detailed information based on previous interaction.

ADAPTIVE DIALOGUE MANAGEMENT USING INTENT

CLUSTERING AND FUZZY RULES

ACCEPTED (PEER-REVIEWED) VERSION

David Griol, Zoraida Callejas

Dept. of Software Engineering, University of Granada

Periodista Daniel Saucedo Aranda s.n., 18071. Granada, Spain

{dgriol, zoraida}@ugr.es

José Manuel Molina, Araceli Sanchis

Dept. of Computer Science,

Universidad Carlos III de Madrid

Avda. de la Universidad, 30, 28911. Leganés, Spain

molina@ia.uc3m.es, masm@inf.uc3m.es

This is the peer reviewed version of the following article: Griol, D, Callejas, Z, Molina, JM, Sanchis, A. Adaptive

dialogue management using intent clustering and fuzzy rules. Expert Systems. 2020; e12630. , which has been

published in ﬁnal form at

https://doi.org/10.1111/exsy.12630

. This article may be used for non-commercial

purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.

ABSTRACT

Conversational systems have become an element of everyday life for billions of users who use

speech-based interfaces to services, engage with personal digital assistants on smartphones, social

media chatbots, or smart speakers. One of the most complex tasks in the development of these

systems is to design the dialogue model, the logic that provided a user input selects the next answer.

The dialogue model must also consider mechanisms to adapt the response of the system and the

interaction style according to different groups and user proﬁles. Rule-based systems are difﬁcult

to adapt to phenomena that were not taken into consideration at design-time. However, many of

the systems that are commercially available are based on rules, and so are the most widespread

tools for the development of chatbots and speech interfaces. In this paper, we present a proposal

to: i) automatically generate the dialogue rules from a dialogue corpus through the use of evolving

algorithms, ii) adapt the rules according to the detected user intention. We have evaluated our

proposal with several conversational systems of different application domains, from which our

approach provided an efﬁcient way for adapting a set of dialogue rules considering user utterance

clusters.

Keywords

Conversational Systems

Dialogue Management

Dialogue Rules, Evolving Classiﬁers

Clustering

User

Modeling

1 Introduction

Conversational systems are computer programs that engage users in a dialogue using natural language [

]. The

increasing maturity of speech and conversational technologies has made possible to integrate conversational interaction

in a range of application domains, smart devices, and environments. Some good examples are personal assistants

in mobile devices and smart speakers, educational tutoring agents, entertainment chatbots in open domains, online

customer services that answer questions about products or services, question answering and triage systems, and robots

that offer spoken communication [4, 5, 1, 6].

Spoken conversational systems usually require a sequence of interactions between the user and the system to gradually

achieve the user’s goals after several dialogue turns. A set of basic actions is repeated after each user utterance to

recognize the sequence of words in the speech signal (Automatic Speech recognition, ASR), understand these words

to obtain their meaning (i.e, extract and classify the information pieces that are useful for the domain of the system)

(Spoken Language Understanding, SLU), consider the context features of the conversation and the results of the

queries to the data repositories of the system to decide the next action/s for the system (Dialogue Management, DM),

and generate a spoken message to provide a response to the user (Natural Language Generation and Text-to-Speech

Synthesis, NLG and TTS) [7, 1, 3].

The Dialogue Manager (DM) is the component responsible for determining the next action to be taken by the system

provided the user input. In order to do so, it must follow a dialogue strategy that can be computed using machine

learning approaches or that may follow a predeﬁned set of rules [1, 8, 9].

The design practices of conventional commercial dialogue systems are well established in industry and usually adopt

a rule-based approach. This way, voice user interface (VUI) experts [

] handcraft a detailed dialogue plan

based on their knowledge about the speciﬁc task and the business rules. In addition, designers commonly deﬁne the

precise wording for the system prompts according to the dialogue state and context, and also the expected types of

user’s utterances for each turn.

As described in [

], this approach is well-documented [

] and has been used to develop hundreds of successful

commercial dialogue systems. Rule-based approaches are also an efﬁcient alternative when the dialogue system must

be available in an embedded device with important hardware constraints [

]. However, the resulting dialogue model

lacks the ﬂexibility to adapt the system according to new observations that were not considered at design time. Thus,

new approaches must be considered in order to develop dialogue models that can be incorporate into state-of-the-art

development environments and at the same time provide enough ﬂexibility.

Most existing dialogue systems collect data to assess and improve their performance, including some form of quality

assessment, time management representation, and business processing aspects. Dialogue systems can employ this

information to provide personalized dialogue management strategies [

], for instance to adapt the services and

information provided by the conversational system, optimize the overall time required to solve the users’ queries and

favour user engagement and ﬁdelity.

While the combination of data with statistical methodologies has been applied to develop and improve different aspects of

spoken dialogue systems (e.g., Hidden Markov models and Gaussian mixture models for Automatic Speech Recognition;

Hidden Vector State model, Stochastic Finite State Transducers, Dynamic Bayesian Networks, Support Vector Machines

and Conditional Random Fields for Spoken Language Understanding; Partially observable Markov Decision Processes

and Bayesian Networks for Dialogue Management; Markov Decision Processes and Reinforcement Learning for Natural

Language Generation) [

], they have not been sufﬁciently exploited to develop dialogue management models that can

be easily adapted to different users, extension or the current application domain or new application domains. It is also

required to have the explainability of the solution reached and the ﬂexibility of transforming these statistical models into

a set of rules to implement these models using already existing commercial infrastructures, such as Google DialogFlow,

IBM Watson or Amazon Lex. These rules must be also designed to operate under partially observable settings.

This paper presents three main contributions:

•

An approach to obtain a dialogue model for a speciﬁc task by means of a classiﬁcation process using fuzzy-

rule-based evolving classiﬁers. This allows obtaining a set of fuzzy rules that can be directly employed to

develop a rule-based dialogue manager, making it possible to obtain new generation interfaces without the

need for changing the already existing commercial infrastructures.

•

An approach based on clustering to identify user intents, and how it can be incorporated to adapt the system

rules to new observed user inputs by associating them to one of the clusters computed.

•

To evaluate our proposal with conversational systems developed for domains with very different nature and

complexity, different deﬁnitions for the semantics of the task, interaction languages, dialogue initiatives,

conﬁrmation and error handling and correction techniques, interaction, underlying technologies, and different

techniques for acquiring the initial dialogue corpus (human-human, human-machine, Wizard of Oz technique).

The remainder of the paper is organized as follows. In Section 2 we describe the motivation of our proposal and review

the related work in the areas of dialogue management and adaptation methodologies for conversational interfaces.

Section 3 describes our proposal for automatically deﬁning rule-based dialogue models and adapting them according

to the user input clustering. Section 4.1 describes the dialogue systems used for evaluating our proposal. Section 5

presents the experimental set-up, the measures deﬁned to evaluate the dialogue models and a discussion of the results

obtained. Finally, Section 6 presents the conclusions and some guidelines for future work.

2 State of the art: user-adapted conversational interfaces

User adaptation in conversational systems can be achieved by means of the combination of techniques to capture and

represent the information used to characterize and classify users, and ﬂexible dialogue management strategies to adapt

the dialogue model according to these information sources [1, 3].

2.1 Dialogue management methodologies

As described in the previous section, the design of the dialogue model is one of the most important tasks of a

conversational system given that the selection of a speciﬁc system action depends on multiple factors (e.g., the outputs

and conﬁdence measures provided by the ASR and SLU modules; the channels and devices used for the interaction;

the results of the queries to the data repositories; restrictions deﬁned for the speciﬁc domain of the system; interaction

initiatives directed by the user, system or both; conﬁrmation strategies based on explicit or implicit conﬁrmations, etc.).

Due to these factors, the design of the dialogue model is at the core of conversational interface engineering and is largely

responsible for user satisfaction. In addition, to consider these factors to decide the next system response, the dialogue

manager needs to track the dialogue history and update its representation of the current state of the dialogue [1, 8].

Conversational systems designed for highly structured tasks and system-directed initiatives usually employ ﬁnite

state-based dialogue models [

], for which the users’ utterances are restricted to simple cases within the scope of the

ASR and SLU modules. This approach is generally implemented using ﬁnite state automata with handcrafted rules.

Frame-based dialogue managers [

] use a slot-based frame structure to collect each information piece provided by the

user. Users can ﬁll more than one slot per dialogue turn following any order. Plan-based approaches [

] claim that

the speaker’s speech act is part of a plan and that it is the listener’s task to identify and respond appropriately to this

plan. Agent-based approaches can be employed when it is necessary to execute and monitor operations in application

domains that change dynamically [23].

In all these approaches, application developers, together with voice user interface designers, typically handcraft DM

strategies using rules and heuristics. However, there exist approaches to introduce variability in these systems. For

example, the emotional virtual agent PRIMER uses a rule-based dialogue manager [

] that is adaptive to the user as it

does not only consider the current dialogue state, but also the emotional state of the conversation and user progress.

However, in certain scenarios it may be difﬁcult and costly to foresee which form of system behavior will lead to a quick

and successful completion of the dialogue. This has motivated the research community to ﬁnd ways for automating

dialogue learning using statistical models trained with real conversations [

]. These models allow to

explore a wider range of strategies to model the variability of user behaviours.

Statistical approaches to DM can be classiﬁed into three main categories: dialogue modeling based on reinforcement

learning (RL), corpus-based statistical dialogue management, and example-based dialogue management. Example-based

approaches can be considered a speciﬁc case of corpus-based statistical dialogue management, given that they predict

the next system action when the dialogue manager ﬁnds dialogue examples that have a similar dialogue state to the

current one [20].

An extended methodology for learning dialogue strategies models human-computer interaction as an optimization

problem using Partially Observable Markov Decision Process (POMDPs) and reinforcement methods [

]. This

approach uses multiple hypotheses of the current dialogue state to explicitly represent uncertainty. However, it is limited

to small-scale problems, since they require many hand-crafted features for the state and action space representation

[

]. Other interesting approaches for statistical DM are based on modeling the system by means of Hidden Markov

models (HMMs) [

], stochastic ﬁnite-state transducers [

], Bayesian networks [

], or recurrent neural networks

[32] and Reinforcement Deep Learning [33].

In this paper, we propose a hybrid approach to dialogue management that seeks to combine the beneﬁts of rule-based

and statistical techniques in a single framework. To do this, we propose a statistical technique to automatically

extract the set of rules for dialogue management from a labeled dialogue corpus. Several authors have followed a

similar approximation, for example, using discriminative classiﬁcation models to learn information state updates [

probabilistic rules whose values are estimated from dialogue data using Bayesian inference [

], or procedural dialogue

management methodologies based on deﬁning dialogue trees [35].

Our proposal differs from the aforementioned approaches in several main aspects. Our proposal models the dialogue by

means of a classiﬁcation process based on fuzzy-rule-based evolving classiﬁers, which considers the complete history

of the dialogue as input. Fuzzy logic is an efﬁcient means to reason over the data and provide a varied system behaviour

while maintain the explainability of the solution reached. Recently, [

] employed fuzzy rules to build a recommender

system that follows a conversational logic in which the user is prompted about their preferences at each step. In their

case, the fuzzy logic is aimed at optimizing the recommendation, while for us it is way of obtaining a set of rules

that can be directly employed to develop a rule-based dialogue manager, making possible to obtain new generation

interfaces without the need for changing the already existing commercial infrastructures.

2.2 Identifying user intents

As explained in the introduction section, the rules generated for the dialogue manager must be adaptable to different

users. A recent review of the literature of the last decade about conversational agents in business [

], has highlighted as

one the challenges for the future the deﬁnition and management of user proﬁles, the context of interaction, and the

detection methods of user’s intention in order to provide personalized responses.

Indeed, to provide a positive user experience, dialogue systems should ideally adapt to the behavior of individual

users or particular user groups. User proﬁling and behaviour adaptation is also a hot topic in human-robot interaction,

specially for the detection of social cues [37].

In order to be user aware, dialogue systems must identify user proﬁles and be able to tailor the interaction the them.

In [

], the authors present three basic forms of showing user awareness in conversational agents: being aware of

the previous interactions, re-prompt users when there is no input, reword and re-prompt messages with more detailed

information based on previous interaction. These guidelines are valid for all system users and at the same type consider

speciﬁc information about each individual. Another approach is to group users that behave similarly and generate

adaptation categories.

Research in techniques for user modeling has a long history within the ﬁelds of conversational systems [

]. Most of the

proposed user models are used to generate dialogue corpora by means of the interaction between the conversational

system and the user model. Different types of statistical techniques have been proposed to achieve this goal: n-gram

models[

], graph-based models [

], Hidden Markov Models [

], logistic regression [

], EM-based algorithms [

This kind of models have been used to automatically evaluate conversational systems (particularly at the earlier stages

of development), determine the effects of changes to the system’s functionalities or evaluate the capacity of the dialogue

manager to react to unexpected situations. However, these models are not used to deﬁne user proﬁles and adapt the

dialogue manager.

A recent survey of intent detection methods for dialogue systems is presented in [

]. This is one of the main tasks for

the NLU module in order to detect the dialogues acts in the user utterance and provide this information to the dialogue

manager. A range of statistical techniques have been proposed to complete this task, from traditional techniques, such

as Naive Bayes [

] or Support Vector Machines [

], to mainstream methods, such as Convolutional Neural Networks

[

], word embeddings [

] Recurrent Neural Networks [

], Long Short-Term Memory Networks [

] or Gated

Recurrent Units [50].

Clustering techniques have been proposed to avoid the time-consuming task required to manually analyze the set of user

questions and create a taxonomy of intents to be assigned to the appropriate system responses [

]. These techniques

have been employed to mine dialogue corpora with different purposes. For example, [

] used clustering to group

different types of comments provided by annotators and use them to anticipate dialogue breakdown. [

] propose a

clustering technique to extend the standard K-means algorithm to simultaneously cluster user and system utterances

considering their adjacency information.

Clustering relies on distance measures between the user inputs being classiﬁed. Measuring semantic text similarity

has been a key topic in Artiﬁcial Intelligence, Natural Language Processing, and information retrieval for many years

[

]. Different techniques has been proposed for this task [

]. A set of techniques uses bag of words to represent

documents using vectors and calculate distances in the vector space model using the cosine similarity. The second

group of techniques are based on the assumption that two text sequences are semantically equivalent if their words

or expressions can be aligned. The third group of approaches use machine learning approaches combining different

measures, features and techniques: graph-based approaches [

], word embeddings [

], convolutional neural networks

[58], dynamic ﬁnite state machines [59], etc.

[60] showed the feasibility of this approach with application domains with a varied number of slots and also with real

and simulated users. [

] describes a modiﬁcation of the K-Means algorithm adapted to question-answering websites

and forums. [

] presents a proposal to use clustering to create directed graphs with the main transitions between topics

among dialogues.

These techniques are also proposed in [

] to identify the topics and sequences of topics during the dialogue.

Different methods are then proposed to use this information to manage the dialogue (i.e., model the transitions between

topics: Hidden Markov Models Paul12, Latent Dirichlet Allocation [

], Deep Learning techniques [

], POMDPs

[

] or ad-hoc methods based on the deﬁnition of heuristics [

]. We have used clustering in previous work to generate

user proﬁles that can be of interest to assess the quality of the system and/or predict user satisfaction [69].

The method that we propose to identify the user intents is based on measuring the semantic textual similarity between

the user sentences available in a data corpus for the target domain. This information is then used with the Density-Based

Spatial Clustering of Applications with Noise (DBSCAN) algorithm [

] to obtain an initial representation of the

domain based on the distance matrix computed. The set of clusters obtained represents the users’ sentences that can be

assigned to the same system intent. After the system receives a new user input, it is assigned to one of the clusters and

this information is taken into account in the conditions of the fuzzy dialogue rules for a ﬁne grained decision of the next

system response, setting the basis for an integrated adaptive dialogue manager development procedure that can be easily

interpreted, adapted and implemented with commercial platforms.

3 Our proposal for deﬁning user’s adaptive dialogue models

Figure 1 shows the architecture of the proposed framework to design rule-based dialogue managers. This ﬁgure shows

the training and run time operations of the proposal. In the training stage, the user utterances that are available in the

training corpus are used to calculate the text similarities among them.

For the corpus in English we have computed the semantic textual similarity (STS) between each pair of user phrases in

the corpus using the API of the University of Maryland [

]. This API uses an hybrid word similarity model based on

Latent Semantic Analysis (LSA) Word Similarity hypothesis (words occurring in the same contexts tend to have similar

meanings) based on a corpus with three billion words. The word co-occurrence models are combined with WordNet to

boost LSA similarity and deal with words of many senses.

The previous measure is not available for Spanish, so for languages different to English we have used cosine similarity,

a metric that is very widespread to compare documents. It measures the angle between two vectors projected in a

multidimensional space. In our case, two arrays containing the word counts, where each dimension corresponds to a

word in the user input. To perform the calculations we have implemented a script in Python using the library scikit to

Adaptive dialogue management using intent clustering and fuzzy rules

Figures

Citations

Dialogue Systems for Intelligent Human Computer Interactions.

Towards conversational technology to promote, monitor and protect mental health.

References

A comparison of event models for naive bayes text classification

A Neural Conversational Model

Partially observable Markov decision processes for spoken dialog systems

An approach to online identification of Takagi-Sugeno fuzzy models

POMDP-Based Statistical Spoken Dialog Systems: A Review

Related Papers (5)

Clustering behaviors of Spoken Dialogue Systems users

A form-based dialogue manager for spoken language applications

Shaping user input: a strategy for natural language dialogue design

A case-based natural language dialogue system using dialogue act

User Modeling in Spoken Dialogue Systems to Generate Flexible Guidance

Frequently Asked Questions (20)

Q1. What are the contributions in this paper?

Q2. What are the future works in this paper?

Q3. How many user turns did the corpus have?

Q4. How many system responses were defined for the Let’s Go task?

Q5. How many clusters have been defined for the Let’s Go task?

Q6. What is the syntax used to perform the calculations?

Q7. What are the main characteristics of conversational systems?

Q8. How many user dialogues were acquired for the Let’s Go task?

Q9. What are the three basic forms of showing user awareness in conversational agents?

Q10. What is the purpose of the proposed technique?

Q11. What is the structure of the fuzzy rule in the eClass0 model?

Q12. What are the three groups of approaches used to perform this task?

Q13. How many successful dialogues were obtained in the Edecan corpus?

Q14. How many successful dialogues were obtained in the Dihana task?

Q15. What types of statistical techniques have been proposed to achieve this goal?

Q16. What is the percentage of successful dialogues obtained in the Dihana task?

Q17. What can be done with the information provided by the conversational system?

Q18. What is the percentage of successful dialogues obtained in the DM2 model?

Q19. What is the need to have the explainability of the solution reached?

Q20. How many calls reached the stage of presenting results to the user?