scispace - formally typeset
Open AccessProceedings ArticleDOI

Interpreting Human Responses in Dialogue Systems using Fuzzy Semantic Similarity Measures

TLDR
Results from a 32 participant study, show that the fuzzy measure improves rule matching within the dialogue system by 21.88% compared with the crisp measure known as STASIS, thus providing a more natural and fluid dialogue exchange.
Abstract
Dialogue systems are automated systems that interact with humans using natural language. Much work has been done on dialogue management and learning using a range of computational intelligence based approaches, however the complexity of human dialogue in different contexts still presents many challenges. The key impact of work presented in this paper is to use fuzzy semantic similarity measures embedded within a dialogue system to allow a machine to semantically comprehend human utterances in a given context and thus communicate more effectively with a human in a specific domain using natural language. To achieve this, perception based words should be understood by a machine in context of the dialogue. In this work, a simple question and answer dialogue system is implemented for a cafe customer satisfaction feedback survey. Both fuzzy and crisp semantic similarity measures are used within the dialogue engine to assess the accuracy and robustness of rule firing. Results from a 32 participant study, show that the fuzzy measure improves rule matching within the dialogue system by 21.88% compared with the crisp measure known as STASIS, thus providing a more natural and fluid dialogue exchange.

read more

Content maybe subject to copyright    Report

Adel, Naeemeh ORCID logoORCID: https://orcid.org/0000-0003-4449-7410,
Crockett, Keeley ORCID logoORCID: https://orcid.org/0000-0003-1941-
6201, Chandran, David and Carvalho, Joao (2020) Interpreting Human Re-
sponses in Dialogue Systems using Fuzzy Semantic Similarity Measures. In:
IEEE World Congress on Computational Intelligence - IEEE FUZZ 2020, 19
July 2020 - 24 July 2020, Glasgow, UK (virtual congress).
Downloaded from:
https://e-space.mmu.ac.uk/625464/
Publisher: IEEE
DOI: https://doi.org/10.1109/FUZZ48607.2020.9177605
Please cite the published version
https://e-space.mmu.ac.uk

Interpreting Human Responses in Dialogue Systems
using Fuzzy Semantic Similarity Measures
Naeemeh Adel, Keeley Crockett
School of Computing, Mathematics and Digital Technology,
Manchester Metropolitan University, Chester Street,
Manchester, M1 5GD, UK
N.Adel@mmu.ac.uk
David Chandran
Institute of Psychiatry, Psychology & Neuroscience, Kings
College London, 16 De Crespigny Park, London,
SE5 8AF, UK
Joao P. Carvalho
INESC-ID / Instituto Superior Tecnico, Universidade de
Lisboa, Portugal
Abstract Dialogue systems are automated systems that
interact with humans using natural language. Much work has been
done on dialogue management and learning using a range of
computational intelligence based approaches, however the
complexity of human dialogue in different contexts still presents
many challenges. The key impact of work presented in this paper
is to use fuzzy semantic similarity measures embedded within a
dialogue system to allow a machine to semantically comprehend
human utterances in a given context and thus communicate more
effectively with a human in a specific domain using natural
language. To achieve this, perception based words should be
understood by a machine in context of the dialogue. In this work,
a simple question and answer dialogue system is implemented for
a café customer satisfaction feedback survey. Both fuzzy and crisp
semantic similarity measures are used within the dialogue engine
to assess the accuracy and robustness of rule firing. Results from
a 32 participant study, show that the fuzzy measure improves rule
matching within the dialogue system by 21.88% compared with
the crisp measure known as STASIS, thus providing a more
natural and fluid dialogue exchange.
Keywords— dialogue systems, conversational agents, fuzzy
semantic similarity measures, fuzzy natural language
I. INTRODUCTION
Dialogue Systems (DS) are applications, which effectively
replace human experts by interacting with users through natural
language dialogue to provide a type of service or advice [1]. In
order for a DS to engage with humans, they must be able to
handle extended natural language dialogue relating to complex
tasks and potentially engage in decision-making. In this sense,
agents are helpful tools for human-machine interaction, allowing
the input of data via natural language, processing sentences, and
returning answers appropriately through text. DS, sometimes
known as conversational agents, have been used in a wide range
of applications such as customer service [1], help desk support
[2], Educational [3,4,5,6], Cognitive Behavioural Therapy for
young adults [7], insurance [8] and healthcare [9]. Dialogue
understanding has become more valuable to companies with the
easier ability to gain insights from unstructured text through
Google’s AutoML and natural language API [10], to Amazon’s
use of supervised machine learning to allow correct
interpretation of natural language vocabulary reducing, for
example, the detection of false positive responses [11]. For
spoken DS, task based systems which utilise deep reinforcement
learning techniques in their dialogue management systems are
also becoming more available to industry [12]. What makes a
successful DS is the ability for the machine to understand and
interpret the human’s natural language response in the context
of the conversation.
Traditionally, DS used a pattern matching method to
determine the most suitable response through computation of
rule strengths for all matched occurrences of scripted patterns in
the context of the system. The pattern matching approach has
shown effectiveness and flexibility to develop extended
dialogue applications [1, 13, 14] especially when coupled with
ruled based matching algorithms to produce controlled
responses and offer flexibility to sustain dialogues with users.
However, scripting patterns is known as a laborious and time-
consuming task with many flaws. More recently, some DS have
opted to use short text semantic similarity measures (STSM) in
place of pattern matching [6, 14, 15]. Utilising STSM within a
DS is more effective than other techniques because it replaces
the scripted patterns by a few natural language sentences in each
rule. Evaluation of STSM based systems has been shown to
improve the robustness of the system in terms of increasing the
number of correctly fired rules, thus maintaining the
conversational flow and increasing usability [15, 16]. However,
when traditional STSM are used, they do not sufficiently match
the fuzziness of natural language i.e. the human perception-
based words, leading to a fundamental meaning of the human
utterance in the dialogue context being misunderstood, causing
incorrect firing of a rule, leading to incorrect flow of
conversation and even wrong tasks being suggested.
Fuzzy Sentence Similarity Measures (FSSM) are algorithms
that can compare two or more short texts or phrases which
contain human perception-based words, and will return a
numeric measure of similarity (composed of both semantic and
syntactic elements) of meaning between them. This paper
utilises one such measure known as FUSE (FUzzy Similarity
mEasure) [17] which uses both WordNet [18] and a series of
fuzzy ontologies which have been modelled from human
representations using Interval Type-2 fuzzy sets [17]. FUSE has

been shown to model intra-personal and inter-personal
uncertainties of fuzzy words representative of natural language.
This paper describes the creation and evaluation of a simple
DS which utilises the FUSE measure to match human utterances
to a set of fuzzy phrases with a rule-based system. The aim is to
improve the robustness of rule matching within the DS
compared with the use of a crisp similarity measure in a market
research scenario where the capture of rich descriptive dialogue
is important in gaining customer insight. A fuzzy DS can be
used to automate the analysis of unstructured answers given to
open ended questions, allowing for richer insight when
collecting survey data. For example, an understanding of the
dialogue, can lead to further probing to obtain more descriptive
answers that provide greater insight into why a particular answer
was given. This paper aims to address the following research
question:
Can a Fuzzy Sentence Similarity Measure (FSSM) be
incorporated into a dialogue system to improve rule matching
ability from user utterance compared with a traditional STSM?
This paper is organised as follows; Section II provides a brief
overview of dialogue systems and illustrates the differences
between the use of traditional pattern matching and semantic
similarity measures with the management of the human-
machine conversation. Section III describes the design of a
simple dialogue system that comprises of an FSSM, for collating
human responses for evaluating customer feedback in a café and
section IV describes the experimental methodology and results.
Finally, section V presents the conclusions and future work.
II. DIALOGUE SYSTEMS
In this section, we briefly examine the dialogue engine within
the DS, which is used to maintain conversational flow. We
review and highlight typical problems associated with pattern
matching and outline why the use of STSS overcomes some of
the problems.
A) Strengths and Weaknesses of Pattern Matching
A dialogue system, sometimes referred to as a conversational
agent (CA) is a computer program which interacts with a user
through natural language dialogue and provides some form of
service [1, 2, 19, 20, 21], however, they typically suffer from
high maintenance in updating dialogue patterns for new
scenarios due to the huge number of language patterns within
the scripts. Typically DS work off scripts, which are organized
into contexts, consisting of hierarchically organized rules with
combining patterns and associated responses (see Figure. 1 for
an example of a pattern matching rule). Scripts need to capture
a wide variety of inputs and hence many rules are required, each
of which deals with an input pattern and the possible variations
and an associated response [5, 14, 16]. InfoChat is one such
pattern matching system which utilises the sophisticated
PatternScript scripting language [22] and has been adapted over
the years for use in intelligent conversational tutorial systems
[6]. Figure. 1 shows an example of a pattern matching rule, <tle-
help-desk> which has been encoded using the scripting language
provided with the agent InfoChat. The rule uses default values
for (a)ctivation and (p)attern matching strength, has a
(c)ondition (that the variable att_name has a value) and a
response consisting both of a text and the setting of a variable
<set att_service_type PC_fault>. Figure. 1 illustrates that
scripting patterns is inefficient, results in domain instability and
high maintenance costs. Whilst pattern matching scripting
engines are a mature technology and robust, to some degree to
expected user input, scripting is an art form and requires good
knowledge of the language and the ability to perform in-depth
knowledge engineering of the domain [1, 4, 16].
B) Semantic Similarity Measures
In a Semantic Dialogue System, each rule is matched in
accordance with a pre-determined semantic similarity threshold,
which is set initially through empirical evaluation and depends
upon the sensitivity of rules within a context. A simple rule
(Figure. 2) comprises of a set of prototypical sentences, (s),
where the similarity with the user utterance is calculated using a
STSM. Each rule has a series of responses, (r), which are
provided to the user and can be randomly selected. Each rule
also has an associated default rule, which would fire if the user
utterance failed to match any prototypical sentences within the
rule. O’Shea et al [15] devised a semantic scripting language
which incorporated an STSS through adapting the pattern
matching language of InfoChat [16] which encompasses the
rule <tle-help-desk>
c:%att_name%
s: There is a problem with my computer
r: Please can you explain what the problem is? *<set
att_service_type PC_fault>
Fig 2. Semantic rule
rule <tle-help-desk>
a:0.01
c:%att_name%
p:50 * something wrong * pc*
p:50 * something wrong * pc
p:50 * something wrong * computer*
p:50 * computer* * faulty*
p:50 * pc* faulty*
p:50 * computer* broken*
p:50 * pc* broken*
p:50 * computer *nt work*
p:50 * pc* *nt work*
p:50 * curing * fault * computer*
p:50 * curing * fault * pc*
p:50 * fault* * pc*
p:50 * fault* computer*
p:50 * pc * fault*
p:50 * computer * fault*
p:50 * problem * pc*
p:50 * problem * computer*
r: Please can you explain what the problem is? *<set
att_service_type PC_fault>
Fig. 1 Pattern matching rule

ability to extract patterns to set variables, set rule conditions and
freeze, promote and demote rules.
In a semantic system, prototypical sentence rules are
compared with user utterances using a pre-selected STSS
algorithm and the rule with the highest similarity match would
fire. The most obvious benefit of using semantic rules is that no
patterns are required and more importantly the semantic
meaning of the utterance can be captured and acted upon within
the dialogue context. Aljameel [4] used a hybrid similarity
approach, combining an STSM with limited patterns, to
construct an Arabic conversational intelligent tutoring system
for the education of autistic children. The conversational agent
processed Arabic utterances using a novel crisp STSM which
utilised the cosine similarity measure to solve the word order
issue associated with the Arabic language. Consequently, this
reduced the number of scripts and rules required. Through
empirical evaluation of two versions of the system, the use of an
STSM reduced the number of unrecognised human utterances to
5.4% compared to 38% in the pattern scripted version and,
hence, the systems incorrect responses were reduced to 3.6%
compared to 10.2% in the pattern scripted version [4]. Similar
improvements on the benefits of utilising a STSM within DS are
also reported in [23]. In this paper, we will replace the traditional
semantic similarity measure with a Fuzzy semantic similarity
measure to evaluate the effectiveness of a DS through a
reduction in the incorrect responses and unrecognised human
utterances compared with using an STSM.
III. A SIMPLE DIALOGUE SYSTEM FOR COLLATING USER
RESPONSES
A) Overview
In this section, we describe a simple question and answer
dialogue system that utilises the FUSE semantic similarity
measure [17], to match user utterances to different categories of
responses to each question. The dialog structure is therefore a
linear sequence of questions, where each question response has
three possible branches. The aim is to distinguish between
human perceptions of fuzzy words in nine categories to assess if
the correct rule fires in response to natural language used within
the human utterance. FUSE [17] is an ontology based similarity
measure that uses Interval Type-2 fuzzy sets to model
relationships between categories of human perception based
words. The FUSE algorithm identifies fuzzy words in a human
utterance and determines their similarity in context of both the
semantic and syntactic construction of the sentence. Currently
FUSE consists of nine fuzzy categories each containing a series
of fuzzy words. These categories are Size/Distance, Age,
Temperature, Worth, Level of Membership, Frequency,
Brightness, Strength and Speed. Initial selection and
methodology for word population can be found in [17]. An
experiment originally described in [17] was used to capture
human ratings to create the fuzzy ontology for these categories
where words were modelled based on Mendel’s Hao-Mendel
Approach (HMA) using Interval Type-2 fuzzy sets [24]. A full
description of the FUSE algorithm and the general approach on
how the fuzzy word models and measures in each category were
derived is given in [17].
B) Design of a Dialogue System for Café Feedback
In order to establish if a FSSM could be used in a dialogue
system, a simple question and answer system was designed to
obtain feedback from participants who visited a local ca. This
was done using a knowledge engineering approach and involved
gathering information about typical questions asked in a
customer satisfaction online questionnaire concerning customer
satisfaction levels in high street cafes. Existing survey questions
were either a mixture of dichotomous questions, multiple choice,
Likert scale questions or free text. Within the proposed Café
feedback DS, each question selected had to be transformed into
one which would allow the user to provide descriptive textual
answers in order to gather as much data as possible to evaluate
the impact of the fuzzy semantic measure. To ensure all the
categories in FUSE were covered, nine questions where created
(Table I), each one covering responses that would contain words
or synonyms of words from each fuzzy category. Each question
formulates a question-rule within the DS where each rule can
have three responses which represent full coverage of the
categories as defuzzified word values obtained through human
experts and Type-II modelling using HMA approach [17].
The rule responses were divided into three thresholds of
high, medium and low, and words (and word synonyms) within
each category would fall under each threshold. The threshold for
each category varies as the number of words and measurements
in each category varies (dependent on human perceptions [17]).
The thresholds in each of the nine categories were selected based
Question Category Question Asked
Q1 Size/Distance
Using descriptive words, how would you
describe the size of the queue?
Q2 Temperature
How would you describe the temperature of
the cafe?
Q3 Brightness
How would you describe the brightness of
the cafe?
Q4 Age
Using descriptive words, how would you
describe the age of the barista that served
you?
Q5 Speed
Once you placed your order, how quickly
was your drink made and served to you?
Q6 Strength
Looking up from your screen to the first
person you see, how would you describe
their physical strength?
Q7 Frequency How frequently do you visit this cafe?
Q8
Level of
Membership
How did todays visit meet your
expectation?
Q9 Worth
How would you describe your experience
overall today?
TABLE I: CAFÉ FEEDBACK DIALOGUE QUESTIONS MAPPED TO
FUZZY CATEGORIES

on the words in that specific category. An example is shown in
Figures 3 and 4 for the two categories of Frequency and Worth.
Considering Figure. 3, for the category Frequency, the low
threshold begins at [-1] and ends at [+0.40], with the last word
to fall in this threshold being Everytime, and the next word after
this which begins the mid threshold is Occasionally at [+0.39],
and this threshold continues up to [-0.20], and even though this
is now a negative value, it still falls in the mid threshold for this
category, and the low threshold starts at [-0.21] and ends at [-1].
Examining Figure. 4 for category Worth, the high threshold
starts at [+1] and ends at [+0.20], the mid threshold begins at
[+0.19] and ends at [-0.20], and the low threshold begins at [-
0.21] and ends at [-1]; thus there was not a single fixed threshold
for all nine categories, as the words and there values varied in
each category. In order to determine the specific high, medium
and low thresholds for each fuzzy category, two English
language experts independently grouped the words for each
category. In the case of disagreement, a third expert was asked
to cast the deciding vote.
C) Scripting
Each question (Table I) was scripted into a context which
represented a category. Three English prototypical sentences
were used in each rule to enable coverage of either the high,
medium or the low thresholds. In addition, there were
initialisation and conclusion contexts. Figure. 5 shows three
rules (r) from the Size/Distance category. Each dialogue
exchange between human and machine generated a human
utterance that was compared to the prototypical sentences in
each rule. In each context, the rule where the (s)sentence gave
the highest similarity score compared with the human utterance,
was analysed and fired through FUSE. An attribute is set i.e.
att_size-distance-high becomes true if default-rule1 fires and a
Fig 3. Frequency threshold
Fig 4. Worth threshold
s: It was long
s: It was huge
r: Using descriptive words, how would you describe the size of the queue?
*<set att_size-distance-high>
c: temperature_context
<Default-rule2><size/distance>
s: It was average
s: It was regular
r: Using descriptive words, how would you describe the size of the queue?
*<set att_size-distance-medium>
c: temperature_context
<Default-rule3><size/distance>
s: It was tiny
s: It was small
r: Using descriptive words, how would you describe the size of the queue?
*<set att_size-distance-low>
c: temperature_context
Fig 6. Simple Interface Design

Citations
More filters
Proceedings ArticleDOI

Fuzzy Influence in Fuzzy Semantic Similarity Measures

TL;DR: In this article, a fuzzy influence factor is introduced into an existing measure known as FUSE, which computes the similarity between two short texts based on weighted syntactic and semantic components in order to address the issue of comparing fuzzy words that exist in different word categories.
Journal ArticleDOI

From Unstructured to Structured: Transforming Chatbot Dialogues into Data Mart Schema for Visualization

TL;DR: In this article , the authors present a process for transforming unstructured dialogues into a structured schema, which comprises four stages: processing the dialogues through entity extraction and data aggregation, storing them as NoSQL documents on the cloud, transforming them into a star schema for online analytical processing and building an extract-transform-load workflow, and creating a web-based dashboard for visualizing summarized data and reports.
Journal ArticleDOI

Classification of Sentiment Analysis based on Adaptive Neuro Fuzzy Inference Model (ANFIS)

TL;DR: The research work is able to identify the best method to be used for the reviews identification in the best possible method based on the comparison of the other traditional based Sentiment Analysis methodology with the ANFIS.
References
More filters
Journal ArticleDOI

Sentence similarity based on semantic nets and corpus statistics

TL;DR: Experiments demonstrate that the proposed method provides a similarity measure that shows a significant correlation to human intuition and can be used in a variety of applications that involve text knowledge representation and discovery.
Journal ArticleDOI

Survey of conversational agents in health

TL;DR: An agent application taxonomy was developed, the main challenges in the field were identified, and the main types of dialog and contexts related to conversational agents in health were defined.
Journal ArticleDOI

An adaptation algorithm for an intelligent natural language tutoring system

TL;DR: The results show that learners experiencing a conversational tutorial personalised to their learning styles performed significantly better during the tutorial than those with an unmatched tutorial.
Related Papers (5)
Frequently Asked Questions (6)
Q1. What are the contributions mentioned in the paper "Interpreting human responses in dialogue systems using fuzzy semantic similarity measures" ?

The key impact of work presented in this paper is to use fuzzy semantic similarity measures embedded within a dialogue system to allow a machine to semantically comprehend human utterances in a given context and thus communicate more effectively with a human in a specific domain using natural language. In this work, a simple question and answer dialogue system is implemented for a café customer satisfaction feedback survey. Results from a 32 participant study, show that the fuzzy measure improves rule matching within the dialogue system by 21. 

The light level of the cafe is not brightBoth FUSE and STASIS picked this up as the high threshold because of the word bright, when in effect due to the use of the word not, it actually means it was dark. 

The physical appearance of the barista tells that she was in her 30'sBoth FUSE and STASIS picked this up as belonging to the low category, consisting of words such as baby, young, child, etc;when according to the two English language experts, it should be in the mid threshold containing words such as adult, middleaged, grownup etc. 

Given the original research question, the authors conclude that a Fuzzy Sentence Similarity Measure (FSSM) can be incorporated into a dialogue system to improve rule matching ability from a user utterance compared with a traditional STSM. 

D) Effect on UsabilityAll participants completed a short usability survey comprising of 13 Likert scale questions with allowable free text, following completion of the task. 

An additional example of negations leading to an incorrect rule firing was when the DS asked the question relating to the category Strength:User Utterance: The authorwould describe them as lean and not very strong. 

Trending Questions (1)
What are the effects of similarity on the dialogue in debate?

The effects of similarity on dialogue in debate are not mentioned in the provided information.