What did the DS say about the light level of the cafe?

The light level of the cafe is not brightBoth FUSE and STASIS picked this up as the high threshold because of the word bright, when in effect due to the use of the word not, it actually means it was dark.

What is the definition of the term "Low"?

The physical appearance of the barista tells that she was in her 30'sBoth FUSE and STASIS picked this up as belonging to the low category, consisting of words such as baby, young, child, etc;when according to the two English language experts, it should be in the mid threshold containing words such as adult, middleaged, grownup etc.

What is the high threshold for a FSSM?

Given the original research question, the authors conclude that a Fuzzy Sentence Similarity Measure (FSSM) can be incorporated into a dialogue system to improve rule matching ability from a user utterance compared with a traditional STSM.

What is the effect of the FSSM on the user?

D) Effect on UsabilityAll participants completed a short usability survey comprising of 13 Likert scale questions with allowable free text, following completion of the task.

What was the definition of the term?

An additional example of negations leading to an incorrect rule firing was when the DS asked the question relating to the category Strength:User Utterance: The authorwould describe them as lean and not very strong.

(Open Access) Interpreting Human Responses in Dialogue Systems using Fuzzy Semantic Similarity Measures (2020) | Naeemeh Adel

Adel, Naeemeh ORCID logoORCID: https://orcid.org/0000-0003-4449-7410,

Crockett, Keeley ORCID logoORCID: https://orcid.org/0000-0003-1941-

6201, Chandran, David and Carvalho, Joao (2020) Interpreting Human Re-

sponses in Dialogue Systems using Fuzzy Semantic Similarity Measures. In:

IEEE World Congress on Computational Intelligence - IEEE FUZZ 2020, 19

July 2020 - 24 July 2020, Glasgow, UK (virtual congress).

Downloaded from:

https://e-space.mmu.ac.uk/625464/

Publisher: IEEE

DOI: https://doi.org/10.1109/FUZZ48607.2020.9177605

Please cite the published version

https://e-space.mmu.ac.uk

Interpreting Human Responses in Dialogue Systems

using Fuzzy Semantic Similarity Measures

Naeemeh Adel, Keeley Crockett

School of Computing, Mathematics and Digital Technology,

Manchester Metropolitan University, Chester Street,

Manchester, M1 5GD, UK

N.Adel@mmu.ac.uk

David Chandran

Institute of Psychiatry, Psychology & Neuroscience, Kings

College London, 16 De Crespigny Park, London,

SE5 8AF, UK

Joao P. Carvalho

INESC-ID / Instituto Superior Tecnico, Universidade de

Lisboa, Portugal

Abstract— Dialogue systems are automated systems that

interact with humans using natural language. Much work has been

done on dialogue management and learning using a range of

computational intelligence based approaches, however the

complexity of human dialogue in different contexts still presents

many challenges. The key impact of work presented in this paper

is to use fuzzy semantic similarity measures embedded within a

dialogue system to allow a machine to semantically comprehend

human utterances in a given context and thus communicate more

effectively with a human in a specific domain using natural

language. To achieve this, perception based words should be

understood by a machine in context of the dialogue. In this work,

a simple question and answer dialogue system is implemented for

a café customer satisfaction feedback survey. Both fuzzy and crisp

semantic similarity measures are used within the dialogue engine

to assess the accuracy and robustness of rule firing. Results from

a 32 participant study, show that the fuzzy measure improves rule

matching within the dialogue system by 21.88% compared with

the crisp measure known as STASIS, thus providing a more

natural and fluid dialogue exchange.

Keywords— dialogue systems, conversational agents, fuzzy

semantic similarity measures, fuzzy natural language

I. INTRODUCTION

Dialogue Systems (DS) are applications, which effectively

replace human experts by interacting with users through natural

language dialogue to provide a type of service or advice [1]. In

order for a DS to engage with humans, they must be able to

handle extended natural language dialogue relating to complex

tasks and potentially engage in decision-making. In this sense,

agents are helpful tools for human-machine interaction, allowing

the input of data via natural language, processing sentences, and

returning answers appropriately through text. DS, sometimes

known as conversational agents, have been used in a wide range

of applications such as customer service [1], help desk support

[2], Educational [3,4,5,6], Cognitive Behavioural Therapy for

young adults [7], insurance [8] and healthcare [9]. Dialogue

understanding has become more valuable to companies with the

easier ability to gain insights from unstructured text through

Google’s AutoML and natural language API [10], to Amazon’s

use of supervised machine learning to allow correct

interpretation of natural language vocabulary reducing, for

example, the detection of false positive responses [11]. For

spoken DS, task based systems which utilise deep reinforcement

learning techniques in their dialogue management systems are

also becoming more available to industry [12]. What makes a

successful DS is the ability for the machine to understand and

interpret the human’s natural language response in the context

of the conversation.

Traditionally, DS used a pattern matching method to

determine the most suitable response through computation of

rule strengths for all matched occurrences of scripted patterns in

the context of the system. The pattern matching approach has

shown effectiveness and flexibility to develop extended

dialogue applications [1, 13, 14] especially when coupled with

ruled based matching algorithms to produce controlled

responses and offer flexibility to sustain dialogues with users.

However, scripting patterns is known as a laborious and time-

consuming task with many flaws. More recently, some DS have

opted to use short text semantic similarity measures (STSM) in

place of pattern matching [6, 14, 15]. Utilising STSM within a

DS is more effective than other techniques because it replaces

the scripted patterns by a few natural language sentences in each

rule. Evaluation of STSM based systems has been shown to

improve the robustness of the system in terms of increasing the

number of correctly fired rules, thus maintaining the

conversational flow and increasing usability [15, 16]. However,

when traditional STSM are used, they do not sufficiently match

the fuzziness of natural language i.e. the human perception-

based words, leading to a fundamental meaning of the human

utterance in the dialogue context being misunderstood, causing

incorrect firing of a rule, leading to incorrect flow of

conversation and even wrong tasks being suggested.

Fuzzy Sentence Similarity Measures (FSSM) are algorithms

that can compare two or more short texts or phrases which

contain human perception-based words, and will return a

numeric measure of similarity (composed of both semantic and

syntactic elements) of meaning between them. This paper

utilises one such measure known as FUSE (FUzzy Similarity

mEasure) [17] which uses both WordNet [18] and a series of

fuzzy ontologies which have been modelled from human

representations using Interval Type-2 fuzzy sets [17]. FUSE has

been shown to model intra-personal and inter-personal

uncertainties of fuzzy words representative of natural language.

This paper describes the creation and evaluation of a simple

DS which utilises the FUSE measure to match human utterances

to a set of fuzzy phrases with a rule-based system. The aim is to

improve the robustness of rule matching within the DS

compared with the use of a crisp similarity measure in a market

research scenario where the capture of rich descriptive dialogue

is important in gaining customer insight. A fuzzy DS can be

used to automate the analysis of unstructured answers given to

open ended questions, allowing for richer insight when

collecting survey data. For example, an understanding of the

dialogue, can lead to further probing to obtain more descriptive

answers that provide greater insight into why a particular answer

was given. This paper aims to address the following research

question:

Can a Fuzzy Sentence Similarity Measure (FSSM) be

incorporated into a dialogue system to improve rule matching

ability from user utterance compared with a traditional STSM?

This paper is organised as follows; Section II provides a brief

overview of dialogue systems and illustrates the differences

between the use of traditional pattern matching and semantic

similarity measures with the management of the human-

machine conversation. Section III describes the design of a

simple dialogue system that comprises of an FSSM, for collating

human responses for evaluating customer feedback in a café and

section IV describes the experimental methodology and results.

Finally, section V presents the conclusions and future work.

II. DIALOGUE SYSTEMS

In this section, we briefly examine the dialogue engine within

the DS, which is used to maintain conversational flow. We

review and highlight typical problems associated with pattern

matching and outline why the use of STSS overcomes some of

the problems.

A) Strengths and Weaknesses of Pattern Matching

A dialogue system, sometimes referred to as a conversational

agent (CA) is a computer program which interacts with a user

through natural language dialogue and provides some form of

service [1, 2, 19, 20, 21], however, they typically suffer from

high maintenance in updating dialogue patterns for new

scenarios due to the huge number of language patterns within

the scripts. Typically DS work off scripts, which are organized

into contexts, consisting of hierarchically organized rules with

combining patterns and associated responses (see Figure. 1 for

an example of a pattern matching rule). Scripts need to capture

a wide variety of inputs and hence many rules are required, each

of which deals with an input pattern and the possible variations

and an associated response [5, 14, 16]. InfoChat is one such

pattern matching system which utilises the sophisticated

PatternScript scripting language [22] and has been adapted over

the years for use in intelligent conversational tutorial systems

[6]. Figure. 1 shows an example of a pattern matching rule, <tle-

help-desk> which has been encoded using the scripting language

provided with the agent InfoChat. The rule uses default values

for (a)ctivation and (p)attern matching strength, has a

(c)ondition (that the variable att_name has a value) and a

response consisting both of a text and the setting of a variable

<set att_service_type PC_fault>. Figure. 1 illustrates that

scripting patterns is inefficient, results in domain instability and

high maintenance costs. Whilst pattern matching scripting

engines are a mature technology and robust, to some degree to

expected user input, scripting is an art form and requires good

knowledge of the language and the ability to perform in-depth

knowledge engineering of the domain [1, 4, 16].

B) Semantic Similarity Measures

In a Semantic Dialogue System, each rule is matched in

accordance with a pre-determined semantic similarity threshold,

which is set initially through empirical evaluation and depends

upon the sensitivity of rules within a context. A simple rule

(Figure. 2) comprises of a set of prototypical sentences, (s),

where the similarity with the user utterance is calculated using a

STSM. Each rule has a series of responses, (r), which are

provided to the user and can be randomly selected. Each rule

also has an associated default rule, which would fire if the user

utterance failed to match any prototypical sentences within the

rule. O’Shea et al [15] devised a semantic scripting language

which incorporated an STSS through adapting the pattern

matching language of InfoChat [16] which encompasses the

rule <tle-help-desk>

c:%att_name%

s: There is a problem with my computer

r: Please can you explain what the problem is? *<set

att_service_type PC_fault>

Fig 2. Semantic rule

rule <tle-help-desk>

a:0.01

c:%att_name%

p:50 * something wrong * pc*

p:50 * something wrong * pc

p:50 * something wrong * computer*

p:50 * computer* * faulty*

p:50 * pc* faulty*

p:50 * computer* broken*

p:50 * pc* broken*

p:50 * computer *nt work*

p:50 * pc* *nt work*

p:50 * curing * fault * computer*

p:50 * curing * fault * pc*

p:50 * fault* * pc*

p:50 * fault* computer*

p:50 * pc * fault*

p:50 * computer * fault*

p:50 * problem * pc*

p:50 * problem * computer*

r: Please can you explain what the problem is? *<set

att_service_type PC_fault>

Fig. 1 Pattern matching rule

ability to extract patterns to set variables, set rule conditions and

freeze, promote and demote rules.

In a semantic system, prototypical sentence rules are

compared with user utterances using a pre-selected STSS

algorithm and the rule with the highest similarity match would

fire. The most obvious benefit of using semantic rules is that no

patterns are required and more importantly the semantic

meaning of the utterance can be captured and acted upon within

the dialogue context. Aljameel [4] used a hybrid similarity

approach, combining an STSM with limited patterns, to

construct an Arabic conversational intelligent tutoring system

for the education of autistic children. The conversational agent

processed Arabic utterances using a novel crisp STSM which

utilised the cosine similarity measure to solve the word order

issue associated with the Arabic language. Consequently, this

reduced the number of scripts and rules required. Through

empirical evaluation of two versions of the system, the use of an

STSM reduced the number of unrecognised human utterances to

5.4% compared to 38% in the pattern scripted version and,

hence, the systems incorrect responses were reduced to 3.6%

compared to 10.2% in the pattern scripted version [4]. Similar

improvements on the benefits of utilising a STSM within DS are

also reported in [23]. In this paper, we will replace the traditional

semantic similarity measure with a Fuzzy semantic similarity

measure to evaluate the effectiveness of a DS through a

reduction in the incorrect responses and unrecognised human

utterances compared with using an STSM.

III. A SIMPLE DIALOGUE SYSTEM FOR COLLATING USER

RESPONSES

A) Overview

In this section, we describe a simple question and answer

dialogue system that utilises the FUSE semantic similarity

measure [17], to match user utterances to different categories of

responses to each question. The dialog structure is therefore a

linear sequence of questions, where each question response has

three possible branches. The aim is to distinguish between

human perceptions of fuzzy words in nine categories to assess if

the correct rule fires in response to natural language used within

the human utterance. FUSE [17] is an ontology based similarity

measure that uses Interval Type-2 fuzzy sets to model

relationships between categories of human perception based

words. The FUSE algorithm identifies fuzzy words in a human

utterance and determines their similarity in context of both the

semantic and syntactic construction of the sentence. Currently

FUSE consists of nine fuzzy categories each containing a series

of fuzzy words. These categories are Size/Distance, Age,

Temperature, Worth, Level of Membership, Frequency,

Brightness, Strength and Speed. Initial selection and

methodology for word population can be found in [17]. An

experiment originally described in [17] was used to capture

human ratings to create the fuzzy ontology for these categories

where words were modelled based on Mendel’s Hao-Mendel

Approach (HMA) using Interval Type-2 fuzzy sets [24]. A full

description of the FUSE algorithm and the general approach on

how the fuzzy word models and measures in each category were

derived is given in [17].

B) Design of a Dialogue System for Café Feedback

In order to establish if a FSSM could be used in a dialogue

system, a simple question and answer system was designed to

obtain feedback from participants who visited a local café. This

was done using a knowledge engineering approach and involved

gathering information about typical questions asked in a

customer satisfaction online questionnaire concerning customer

satisfaction levels in high street cafes. Existing survey questions

were either a mixture of dichotomous questions, multiple choice,

Likert scale questions or free text. Within the proposed Café

feedback DS, each question selected had to be transformed into

one which would allow the user to provide descriptive textual

answers in order to gather as much data as possible to evaluate

the impact of the fuzzy semantic measure. To ensure all the

categories in FUSE were covered, nine questions where created

(Table I), each one covering responses that would contain words

or synonyms of words from each fuzzy category. Each question

formulates a question-rule within the DS where each rule can

have three responses which represent full coverage of the

categories as defuzzified word values obtained through human

experts and Type-II modelling using HMA approach [17].

The rule responses were divided into three thresholds of

high, medium and low, and words (and word synonyms) within

each category would fall under each threshold. The threshold for

each category varies as the number of words and measurements

in each category varies (dependent on human perceptions [17]).

The thresholds in each of the nine categories were selected based

Question Category Question Asked

Q1 Size/Distance

Using descriptive words, how would you

describe the size of the queue?

Q2 Temperature

How would you describe the temperature of

the cafe?

Q3 Brightness

How would you describe the brightness of

the cafe?

Q4 Age

Using descriptive words, how would you

describe the age of the barista that served

you?

Q5 Speed

Once you placed your order, how quickly

was your drink made and served to you?

Q6 Strength

Looking up from your screen to the first

person you see, how would you describe

their physical strength?

Q7 Frequency How frequently do you visit this cafe?

Level of

Membership

How did todays visit meet your

expectation?

Q9 Worth

How would you describe your experience

overall today?

TABLE I: CAFÉ FEEDBACK DIALOGUE QUESTIONS MAPPED TO

FUZZY CATEGORIES

on the words in that specific category. An example is shown in

Figures 3 and 4 for the two categories of Frequency and Worth.

Considering Figure. 3, for the category Frequency, the low

threshold begins at [-1] and ends at [+0.40], with the last word

to fall in this threshold being Everytime, and the next word after

this which begins the mid threshold is Occasionally at [+0.39],

and this threshold continues up to [-0.20], and even though this

is now a negative value, it still falls in the mid threshold for this

category, and the low threshold starts at [-0.21] and ends at [-1].

Examining Figure. 4 for category Worth, the high threshold

starts at [+1] and ends at [+0.20], the mid threshold begins at

[+0.19] and ends at [-0.20], and the low threshold begins at [-

0.21] and ends at [-1]; thus there was not a single fixed threshold

for all nine categories, as the words and there values varied in

each category. In order to determine the specific high, medium

and low thresholds for each fuzzy category, two English

language experts independently grouped the words for each

category. In the case of disagreement, a third expert was asked

to cast the deciding vote.

C) Scripting

Each question (Table I) was scripted into a context which

represented a category. Three English prototypical sentences

were used in each rule to enable coverage of either the high,

medium or the low thresholds. In addition, there were

initialisation and conclusion contexts. Figure. 5 shows three

rules (r) from the Size/Distance category. Each dialogue

exchange between human and machine generated a human

utterance that was compared to the prototypical sentences in

each rule. In each context, the rule where the (s)sentence gave

the highest similarity score compared with the human utterance,

was analysed and fired through FUSE. An attribute is set i.e.

att_size-distance-high becomes true if default-rule1 fires and a

Fig 3. Frequency threshold

Fig 4. Worth threshold

Fig 5. Sample Rules for Size/ distance category

<default-rule1><size/distance>

s: It was long

s: It was huge

r: Using descriptive words, how would you describe the size of the queue?

*<set att_size-distance-high>

c: temperature_context

<Default-rule2><size/distance>

s: It was average

s: It was regular

r: Using descriptive words, how would you describe the size of the queue?

*<set att_size-distance-medium>

c: temperature_context

<Default-rule3><size/distance>

s: It was tiny

s: It was small

r: Using descriptive words, how would you describe the size of the queue?

*<set att_size-distance-low>

c: temperature_context

Fig 6. Simple Interface Design

Abstract:

Dialogue systems are automated systems that interact with humans using natural language. Much work has been done on dialogue management and learning using a range of computational intelligence based approaches, however the complexity of human dialogue in different contexts still presents many challenges. The key impact of work presented in this paper is to use fuzzy semantic similarity measures embedded within a dialogue system to allow a machine to semantically comprehend human utterances in a given context and thus communicate more effectively with a human in a specific domain using natural language. To achieve this, perception based words should be understood by a machine in context of the dialogue. In this work, a simple question and answer dialogue system is implemented for a cafe customer satisfaction feedback survey. Both fuzzy and crisp semantic similarity measures are used within the dialogue engine to assess the accuracy and robustness of rule firing. Results from a 32 participant study, show that the fuzzy measure improves rule matching within the dialogue system by 21.88% compared with the crisp measure known as STASIS, thus providing a more natural and fluid dialogue exchange.

Interpreting Human Responses in Dialogue Systems using Fuzzy Semantic Similarity Measures

Figures

Citations

Fuzzy Influence in Fuzzy Semantic Similarity Measures

From Unstructured to Structured: Transforming Chatbot Dialogues into Data Mart Schema for Visualization

Classification of Sentiment Analysis based on Adaptive Neuro Fuzzy Inference Model (ANFIS)

References

Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial.

Sentence similarity based on semantic nets and corpus statistics

Survey of conversational agents in health

An adaptation algorithm for an intelligent natural language tutoring system

Conversational Agents

Related Papers (5)

Ambiguity representation and resolution in spoken dialogue systems.

Contextual constraints based on dialogue models in database search task for spoken dialogue systems

Machine learning techniques in dialogue act recognition

On the Evaluation of Dialogue Systems with Next Utterance Classification

Relation Extraction in Dialogues: A Deep Learning Model Based on the Generality and Specialty of Dialogue Text

Frequently Asked Questions (6)

Q1. What are the contributions mentioned in the paper "Interpreting human responses in dialogue systems using fuzzy semantic similarity measures" ?

Q2. What did the DS say about the light level of the cafe?

Q3. What is the definition of the term "Low"?

Q4. What is the high threshold for a FSSM?

Q5. What is the effect of the FSSM on the user?

Q6. What was the definition of the term?

Trending Questions (1)