scispace - formally typeset
Open AccessBook ChapterDOI

Using the interaction rhythm as a natural reinforcement signal for social robots: a matter of belief

TLDR
The results mean that non-expert users can interact naturally and fruitfully with an autonomous robot if the interaction is believed to be natural, without any technical knowledge of the cognitive capacities of the robot.
Abstract
In this paper, we present the results of a pilot study of a human robot interaction experiment where the rhythm of the interaction is used as a reinforcement signal to learn sensorimotor associations. The algorithm uses breaks and variations in the rhythm at which the human is producing actions. The concept is based on the hypothesis that a constant rhythm is an intrinsic property of a positive interaction whereas a break reflects a negative event. Subjects from various backgrounds interacted with a NAO robot where they had to teach the robot to mirror their actions by learning the correct sensorimotor associations. The results show that in order for the rhythm to be a useful reinforcement signal, the subjects have to be convinced that the robot is an agent with which they can act naturally, using their voice and facial expressions as cues to help it understand the correct behaviour to learn. When the subjects do behave naturally, the rhythm and its variations truly reflects how well the interaction is going and helps the robot learn efficiently. These results mean that non-expert users can interact naturally and fruitfully with an autonomous robot if the interaction is believed to be natural, without any technical knowledge of the cognitive capacities of the robot.

read more

Content maybe subject to copyright    Report

Using the Interaction Rhythm as a Natural
Reinforcement Signal for Social Robots:
A Matter of Belief
Antoine Hiolle
1
Lola Ca˜namero
1
Pierre Andry
2
Arnaud Blanchard
2
Philippe Gaussier
2
1
Adaptive Systems Research Group
School of Computer Science
University of Hertfordshire
{a.hiolle, l.canamero}@herts.ac.uk
2
ETIS,ENSEA
Universite de Cergy-Pontoise
CNRS
{andry, gaussier, blanchard@ensea.fr}
Abstract. In this paper, we present the results of a pilot study of a hu-
man robot interaction experiment where the rhythm of the interaction
is used as a reinforcement signal to learn sensorimotor associations. The
algorithm uses breaks and variations in the rhythm at which the human
is producing actions. The concept is based on the hypothesis that a con-
stant rhythm is an intrinsic property of a positive interaction whereas
a break reflects a negative event. Subjects from various backgrounds in-
teracted with a NAO robot where they had to teach the robot to mirror
their actions by learning the correct sensorimotor associations. The re-
sults show that in order for the rhythm to be a useful reinforcement
signal, the subjects have to be convinced that the robot is an agent with
which t hey can act naturally, using their voice and facial expressions as
cues to help it understand the correct behaviour to learn. When the sub-
jects do behave naturally, the rhythm and its variations truly reflects how
well the interaction is going and helps the robot learn efficiently. These
results mean that non-expert users can interact naturally and fruitfully
with an autonomous robot if the interaction is believed to be natural,
without any technical knowledge of the cognitive capacities of the robot.
1 Introduction
The question of how to have robot able to be useful and adaptive in our socially
situated environment is of growing interest. Indeed, in a not so far future, human
will have to interact daily with robots in various settings. During thes e interac-
tions, rob ots will have to gain information from humans, and humans will have
to learn from robots. Concerning the non-verbal aspects of thes e interactions, the
field of developmental robotics has been trying to develop and study algorithms
and architectures as generalisable as po ssible, in order for these systems to be as

minimal as possible, be that on the lower motor level [1] or at the motivational
level [2]. The underlying princ iple of the field is to try and model phases and
phenomena from the development of children and animals to understand and
take adva ntage from the adaptivity and efficiency we observe in them.
Within this framework, the authors have been interested in defining and
testing how and when learning from a human partner can b e achieved with
the minimum amount of prior knowledge from the rob ot, as a young infant has
to do in the early years. During this period, the most important par tner the
infant has is its mother, or primary caregiver. As described by Bowlby in [3], the
infant uses the attachment figure, often the mother, as a secure base to explore
and learn from its experiences in unknown situations. However, the question
is how does the mother elicit these positive responses and promotes healthier
cognitive and socio-emotional development. One hypothesis is that the mother’s
sensitivity, as described in [4], or consistency in the mother’s behaviour and
responses to stimuli is crucial. The positive emotions and mutual delight tha t
mother promotes a healthier development for the infant, and deep engagement
from the mother [5]. Additionally, within the still-face paradigm[6], wher e a
caretaker would produce a neutral expression after a few minutes of interaction
which in turn would produce a significant fall in the infant’s positive responses.
Other frameworks like the Double Video paradigm, for instance [7], measured the
same responses when the synchrony of the interaction was altered by introducing
time delays in the mother’s response. T his would indica te that synchrony and
timing during a mother-infant non-verbal interaction is a strong indicator of the
infant plea sure and emotional responses.
Defining the notion of s ensitivity in the context of human-robot interactions is
far from trivial. The experiment discussed in this paper is based on our previous
work [8][9 ][10], where was raised the question of how important the consistency
of the behaviour of the human to the s tability and accuracy of learned sensori-
motor associations. We attempt to unify these notions in order to build a general
reinforcement signal that could be used by a robot in a large number of settings,
which in turn would help humans interact with robots without any knowledge
of how the robot cognitive system is designed.
We her e present an experiment where a NAO robot has to learn, without any
prior knowledge, the correct sensorimo tor associatio ns in a “mirroring game”.
The actions of the human are mediated by a pink ball, and the robot uses the
rhythm a t which the human is pe rforming a new action in order to reinforce the
correct action to perform. This exper iment is an extension o f the work pre sented
in [11], where the same algorithm was used in a human-computer interaction,
and [12] where the setup was extended to work on an AIBO robot and then a
NAO robot.
The results showed that the rhythm could be used as a reinforcement signa l
for the robot to learn the correct associatio ns, even more so when the principle
was ex plained, since they would explore all pos sibilities they know of. But the
non-expert subjects, who are not used to interact with ro bots and do not know
the rule, did not significa ntly manage to teach the robot the ass ociations.

If the rhythm of the interaction is, as hypothesized, an intrinsic component of
a natural, sur ely something was missing for the non-expert user. The modified
version of this experiment pr esented here aims at discovering what was missing
in the interaction for these subjects to succeed.
2 Architecture and experimental setup
Fig. 1: Experimental setup. The human partner is in front of th e robot, moving a pink
ball between the four different positions in t he visual field. The robot learns the proper
response to mirror the actions of the human partner.
In our setup, the robot is trying to learn to mirror the actions of the human
partner, following the position of a pink ball in its v isual field, as in Fig 1. The
robot has to learn the four different sensor imotor associations, corresponding to
the four possible positions (left arm up when the ball is in the top right of the
visual field,left arm down when the ball is in the bottom right, and respectively
for the left side). The learning algorithm itself functions as follows and the main
components are depicted in Fig.2.
The robot has access to four different per c eptions (ball in the top-le ft part
of the visual field, ball top-right, ball b ottom-left and ball bottom-right) to
which it w ill associate an action. Every time an action is perfo rmed, the rhythm
prediction comp onent will reset and peak after a time corresponding to the last
gap learned between two actions. To summarise, the robo t learns in one-shot the
time elapsed betwee n two different per c eptions, and expects the next a ction to

be performed after this precise duration. The reinforcement signal is calculated
as the difference between the duration expected and the duration observed. The
only prediction made here is the occurrence of the next action from the partner,
which for the robot is an change in the current perception.
Fig. 2: Abstract representation of the robot architecture to learn the sensorimotor as-
sociations
The reinforce ment signal R(t)from the rhythm prediction module varies as a
Gaussian centred on the time t, which is the time of the next pre dic ted event (see
[11] for more details). R(t) is then used to change the weights b etween two fully
connected layers of neurons (the perception and the action to be performed). The
synapses have a weight W
ij
and a probability p
ij
associated to them, and the
rule used is the Probalistic Learning rule. Using this rule, a fully connected
neural network (perceptions connected to potential actions ) behaves as follows:
∆p
ij
= (ǫ + α × R(t)) × C
ij
(1)
p
ij
(t) = H(p
ij
+ ∆p
ij
) (2)
With ǫ the learning speed, α the reinforcement factor and C
ij
the average of
the pa st activation of unit i. Then, if a random draw Rand is hig he r than the
confidence, Rand > p
ij
:
W
ij
= 1 W
ij
(3)
p
ij
= 1 p
ij
(4)

Using Rand promotes an exploratory behaviour when the confidence is low, and
a more exploitative behaviour when it is high.
3 Experimental Design
The aim of the experiment is to assess if and how the human subjects are able
to teach 4 different sensorimotor associatio ns without any explicit feedback or
reinforcement signal being used by the robot, and without the human having a
prior knowledge of the sig nal used by the robot. If the humans are successful in
that task, this will show tha t the rhythm is potentially an intrinsic component
of non-verbal interaction that can help identify successful interaction and allow
a robot to learn without any specific reward.
In order to keep the subjects engage d and to make the robot’s behaviour appear
slightly more life like, we introduced a slow balancing movement on the robot.
Its torso would lean closer to the human partner and then slowly back away
with a low frequency modulated by the rewards obtained over time. Morevover,
in the architecture, a notion of well-be ing has been added in order to control
the expressions of the robot. We therefo re decouple the reward used to learn the
sensorimotor associations a nd the overall behaviour of the ro bot. The robot will
express happiness w he n the well-being is high, and sadness w he n it is low, and
boredom when it is low and the p erceptions stay too stable when the human is
always repeting the same action ove r and over.
Finally, one major change was in the protocol of the exper iment. Regardless of
the background of the subject, they would all hear the same guidelines which
are as follows: You will be asked to use the pink ball to teach the robot to mirror
your actions. The robot is able to hear your voice, but does not understand
words. The robot is able to see your face and what you are expressing. The
robot will only respond to movements (a change of the position of the ball in its
visual field). The LEDs in the eyes of the robots will reflect the quadrant where
the robot perceives the ball, and are turned off when it cannot see the ball any
more. Try and act as though you were teaching this to a 6 to 15 mont h old
infant, who is able to process voices and faces. As a monitoring feature, we also
reflected the expected rhythm in the LEDs of the robot, as an indicator for the
experimenter to see how the system was performing. The LEDs woud turn to a
brighter white the close r to the predicted action we got, and then fade the longer
after the predicted action. The guideline s were modified to provide the human
with potential existing and natural rewards (tone of voice and fa cial expressions)
they can use without having to be trained. This was also believed to raise their
confidence in the overall capabilities of the robot.
We conducted a study with 10 subjects, with ages ranging from 2 3 to 60
years old, and with various backgrounds. The robot used was the same w hich
was use d previously, the NAO robot (Fig. 1). The interaction would last typically
ten minutes, and was ended by the exper imenter. We recorded the value of the
rhythm used by the human, the movements perfor med by the robot, and the
rewards the system identified. Using this we have enough data to know what

Citations
More filters
Journal ArticleDOI

Emotional body language displayed by artificial agents

TL;DR: Overall, these experiments confirmed that body language is an appropriate medium for robots to display emotions and suggest that an affect space for body expressions can be used to improve the expressiveness of humanoid robots.
Journal ArticleDOI

The Neurobehavioral and Social-Emotional Development of Infants and Children

TL;DR: For instance, this paper introduced three impressive tomes: The Neurobehavioral and Social-Emotional Development of Infants and Children, which attempts to integrate developmental theory with data from research studies on maternal-infant interactions.
Proceedings Article

Long-term human-robot interaction with young users

TL;DR: This paper introduces an approach that integrates multiple functional aspects necessary to implement temporally extended human-robot interaction in the setting of a paediatric ward and presents the methodology for the implementation of a companion robot.
Proceedings ArticleDOI

Children's adaptation in multi-session interaction with a humanoid robot

TL;DR: An interesting phenomenon was observed during the experiment: most of the children soon adapted to the behaviors of the robot, in terms of speech timing, speed and tone, verbal input formulation, nodding, gestures, etc.
References
More filters
Book

Attachment and Loss

John Bowlby
Journal ArticleDOI

Sensitivity and Attachment: A Meta-Analysis on Parental Antecedents of Infant Attachment

TL;DR: In this article, a meta-analysis included 66 studies (N = 4,176) on parental antecedents of attachment security and the question was whether maternal sensitivity is associated with infant attachment security, and what the strength of this relation is.
Journal ArticleDOI

Intrinsic Motivation Systems for Autonomous Mental Development

TL;DR: The mechanism of Intelligent Adaptive Curiosity is presented, an intrinsic motivation system which pushes a robot towards situations in which it maximizes its learning progress, thus permitting autonomous mental development.
Book

The neurobehavioral and social-emotional development of infants and children.

TL;DR: Tronick's Still Face Paradigm as mentioned in this paper has been widely used in the development of infants' emotional capacities and coping, and has been used to understand the nature of normal versus abnormal development.
Related Papers (5)
Frequently Asked Questions (11)
Q1. What have the authors contributed in "Using the interaction rhythm as a natural reinforcement signal for social robots: a matter of belief" ?

In this paper, the authors present the results of a pilot study of a human robot interaction experiment where the rhythm of the interaction is used as a reinforcement signal to learn sensorimotor associations. 

Indeed, in the future if robots meet an increasing number of humans, and will need themselves assistance, knowing which partner is the most useful should be a clear advantage. In the future, the authors plan on confirming these results with a broader set of subjects, in age, technological and cultural background. The authors would then be able to study the possibility of conflicts between these signal and develop a system which would try and cope with the possible contradictions. The authors would then be interested to see with what kind of different interactions the rhythm can be used. 

One hypothesis is that the mother’s sensitivity, as described in [4], or consistency in the mother’s behaviour and responses to stimuli is crucial. 

The questionnaire also asked the subject what kind of cues they thought the robot was using to learn, choosing from 4 choices: facial expressions, tone of the voice, rhythm of the humans action, the repetitiveness of the action(explained as repeting the same action over and over consecutively), or free choice. 

The reinforcement signal R(t)from the rhythm prediction module varies as a Gaussian centred on the time t, which is the time of the next predicted event (see [11] for more details). 

As described by Bowlby in [3], the infant uses the attachment figure, often the mother, as a secure base to explore and learn from its experiences in unknown situations. 

R(t) is then used to change the weights between two fully connected layers of neurons (the perception and the action to be performed). 

Concerning the non-verbal aspects of these interactions, the field of developmental robotics has been trying to develop and study algorithms and architectures as generalisable as possible, in order for these systems to be asminimal as possible, be that on the lower motor level [1] or at the motivational level [2]. 

After a first phase of success, where the robot had learned successfully, the subject would accidentally cross another area of the visual field, changing the rhythm, leading the robot to forget an association, which would disturb the human, leading to further mistakes. 

Since this setup still allows the system to interpret false negatives, the robot was forgetting the correct actions and then the subjects were keen on trying to make the robot learn again. 

The experiment discussed in this paper is based on their previous work [8][9][10], where was raised the question of how important the consistency of the behaviour of the human to the stability and accuracy of learned sensorimotor associations. 

Trending Questions (1)
How many episodes do I am not a robot have?

When the subjects do behave naturally, the rhythm and its variations truly reflects how well the interaction is going and helps the robot learn efficiently.