What future works have the authors mentioned in the paper "Using the interaction rhythm as a natural reinforcement signal for social robots: a matter of belief" ?

Indeed, in the future if robots meet an increasing number of humans, and will need themselves assistance, knowing which partner is the most useful should be a clear advantage. In the future, the authors plan on confirming these results with a broader set of subjects, in age, technological and cultural background. The authors would then be able to study the possibility of conflicts between these signal and develop a system which would try and cope with the possible contradictions. The authors would then be interested to see with what kind of different interactions the rhythm can be used.

What kind of cues did the subject use to learn?

The questionnaire also asked the subject what kind of cues they thought the robot was using to learn, choosing from 4 choices: facial expressions, tone of the voice, rhythm of the humans action, the repetitiveness of the action(explained as repeting the same action over and over consecutively), or free choice.

What is the role of the attachment figure in the development of a child?

As described by Bowlby in [3], the infant uses the attachment figure, often the mother, as a secure base to explore and learn from its experiences in unknown situations.

What is the problem with the robot?

After a first phase of success, where the robot had learned successfully, the subject would accidentally cross another area of the visual field, changing the rhythm, leading the robot to forget an association, which would disturb the human, leading to further mistakes.

Why did the robot forget the correct actions?

Since this setup still allows the system to interpret false negatives, the robot was forgetting the correct actions and then the subjects were keen on trying to make the robot learn again.

(Open Access) Using the interaction rhythm as a natural reinforcement signal for social robots: a matter of belief (2010) | Antoine Hiolle

Q: What have the authors contributed in "Using the interaction rhythm as a natural reinforcement signal for social robots: a matter of belief" ?

In this paper, the authors present the results of a pilot study of a human robot interaction experiment where the rhythm of the interaction is used as a reinforcement signal to learn sensorimotor associations.

Q: What is the reinforcement signal from the rhythm prediction module?

The reinforcement signal R(t)from the rhythm prediction module varies as a Gaussian centred on the time t, which is the time of the next predicted event (see [11] for more details).

Q: What is the reinforcement signal used to change the weights between two fully connected layers of neurons?

R(t) is then used to change the weights between two fully connected layers of neurons (the perception and the action to be performed).

Q: What is the main idea of developmental robotics?

Concerning the non-verbal aspects of these interactions, the field of developmental robotics has been trying to develop and study algorithms and architectures as generalisable as possible, in order for these systems to be asminimal as possible, be that on the lower motor level [1] or at the motivational level [2].

Using the Interaction Rhythm as a Natural

Reinforcement Signal for Social Robots:

A Matter of Belief

Antoine Hiolle

Lola Ca˜namero

Pierre Andry

Arnaud Blanchard

Philippe Gaussier

Adaptive Systems Research Group

School of Computer Science

University of Hertfordshire

{a.hiolle, l.canamero}@herts.ac.uk

ETIS,ENSEA

Universite de Cergy-Pontoise

CNRS

{andry, gaussier, blanchard@ensea.fr}

Abstract. In this paper, we present the results of a pilot study of a hu-

man robot interaction experiment where the rhythm of the interaction

is used as a reinforcement signal to learn sensorimotor associations. The

algorithm uses breaks and variations in the rhythm at which the human

is producing actions. The concept is based on the hypothesis that a con-

stant rhythm is an intrinsic property of a positive interaction whereas

a break reﬂects a negative event. Subjects from various backgrounds in-

teracted with a NAO robot where they had to teach the robot to mirror

their actions by learning the correct sensorimotor associations. The re-

sults show that in order for the rhythm to be a useful reinforcement

signal, the subjects have to be convinced that the robot is an agent with

which t hey can act naturally, using their voice and facial expressions as

cues to help it understand the correct behaviour to learn. When the sub-

jects do behave naturally, the rhythm and its variations truly reﬂects how

well the interaction is going and helps the robot learn eﬃciently. These

results mean that non-expert users can interact naturally and fruitfully

with an autonomous robot if the interaction is believed to be natural,

without any technical knowledge of the cognitive capacities of the robot.

1 Introduction

The question of how to have robot able to be useful and adaptive in our socially

situated environment is of growing interest. Indeed, in a not so far future, human

will have to interact daily with robots in various settings. During thes e interac-

tions, rob ots will have to gain information from humans, and humans will have

to learn from robots. Concerning the non-verbal aspects of thes e interactions, the

ﬁeld of developmental robotics has been trying to develop and study algorithms

and architectures as generalisable as po ssible, in order for these systems to be as

minimal as possible, be that on the lower motor level [1] or at the motivational

level [2]. The underlying princ iple of the ﬁeld is to try and model phases and

phenomena from the development of children and animals to understand and

take adva ntage from the adaptivity and eﬃciency we observe in them.

Within this framework, the authors have been interested in deﬁning and

testing how and when learning from a human partner can b e achieved with

the minimum amount of prior knowledge from the rob ot, as a young infant has

to do in the early years. During this period, the most important par tner the

infant has is its mother, or primary caregiver. As described by Bowlby in [3], the

infant uses the attachment ﬁgure, often the mother, as a secure base to explore

and learn from its experiences in unknown situations. However, the question

is how does the mother elicit these positive responses and promotes healthier

cognitive and socio-emotional development. One hypothesis is that the mother’s

sensitivity, as described in [4], or consistency in the mother’s behaviour and

responses to stimuli is crucial. The positive emotions and mutual delight tha t

mother promotes a healthier development for the infant, and deep engagement

from the mother [5]. Additionally, within the still-face paradigm[6], wher e a

caretaker would produce a neutral expression after a few minutes of interaction

which in turn would produce a signiﬁcant fall in the infant’s positive responses.

Other frameworks like the Double Video paradigm, for instance [7], measured the

same responses when the synchrony of the interaction was altered by introducing

time delays in the mother’s response. T his would indica te that synchrony and

timing during a mother-infant non-verbal interaction is a strong indicator of the

infant plea sure and emotional responses.

Deﬁning the notion of s ensitivity in the context of human-robot interactions is

far from trivial. The experiment discussed in this paper is based on our previous

work [8][9 ][10], where was raised the question of how important the consistency

of the behaviour of the human to the s tability and accuracy of learned sensori-

motor associations. We attempt to unify these notions in order to build a general

reinforcement signal that could be used by a robot in a large number of settings,

which in turn would help humans interact with robots without any knowledge

of how the robot cognitive system is designed.

We her e present an experiment where a NAO robot has to learn, without any

prior knowledge, the correct sensorimo tor associatio ns in a “mirroring game”.

The actions of the human are mediated by a pink ball, and the robot uses the

rhythm a t which the human is pe rforming a new action in order to reinforce the

correct action to perform. This exper iment is an extension o f the work pre sented

in [11], where the same algorithm was used in a human-computer interaction,

and [12] where the setup was extended to work on an AIBO robot and then a

NAO robot.

The results showed that the rhythm could be used as a reinforcement signa l

for the robot to learn the correct associatio ns, even more so when the principle

was ex plained, since they would explore all pos sibilities they know of. But the

non-expert subjects, who are not used to interact with ro bots and do not know

the rule, did not signiﬁca ntly manage to teach the robot the ass ociations.

If the rhythm of the interaction is, as hypothesized, an intrinsic component of

a natural, sur ely something was missing for the non-expert user. The modiﬁed

version of this experiment pr esented here aims at discovering what was missing

in the interaction for these subjects to succeed.

2 Architecture and experimental setup

Fig. 1: Experimental setup. The human partner is in front of th e robot, moving a pink

ball between the four diﬀerent positions in t he visual ﬁeld. The robot learns the proper

response to mirror the actions of the human partner.

In our setup, the robot is trying to learn to mirror the actions of the human

partner, following the position of a pink ball in its v isual ﬁeld, as in Fig 1. The

robot has to learn the four diﬀerent sensor imotor associations, corresponding to

the four possible positions (left arm up when the ball is in the top right of the

visual ﬁeld,left arm down when the ball is in the bottom right, and respectively

for the left side). The learning algorithm itself functions as follows and the main

components are depicted in Fig.2.

The robot has access to four diﬀerent per c eptions (ball in the top-le ft part

of the visual ﬁeld, ball top-right, ball b ottom-left and ball bottom-right) to

which it w ill associate an action. Every time an action is perfo rmed, the rhythm

prediction comp onent will reset and peak after a time corresponding to the last

gap learned between two actions. To summarise, the robo t learns in one-shot the

time elapsed betwee n two diﬀerent per c eptions, and expects the next a ction to

be performed after this precise duration. The reinforcement signal is calculated

as the diﬀerence between the duration expected and the duration observed. The

only prediction made here is the occurrence of the next action from the partner,

which for the robot is an change in the current perception.

Fig. 2: Abstract representation of the robot architecture to learn the sensorimotor as-

sociations

The reinforce ment signal R(t)from the rhythm prediction module varies as a

Gaussian centred on the time t, which is the time of the next pre dic ted event (see

[11] for more details). R(t) is then used to change the weights b etween two fully

connected layers of neurons (the perception and the action to be performed). The

synapses have a weight W

and a probability p

associated to them, and the

rule used is the Probalistic Learning rule. Using this rule, a fully connected

neural network (perceptions connected to potential actions ) behaves as follows:

∆p

= (ǫ + α × R(t)) × C

(1)

(t) = H(p

+ ∆p

) (2)

With ǫ the learning speed, α the reinforcement factor and C

the average of

the pa st activation of unit i. Then, if a random draw Rand is hig he r than the

conﬁdence, Rand > p

= 1 − W

(3)

= 1 − p

(4)

Using Rand promotes an exploratory behaviour when the conﬁdence is low, and

a more exploitative behaviour when it is high.

3 Experimental Design

The aim of the experiment is to assess if and how the human subjects are able

to teach 4 diﬀerent sensorimotor associatio ns without any explicit feedback or

reinforcement signal being used by the robot, and without the human having a

prior knowledge of the sig nal used by the robot. If the humans are successful in

that task, this will show tha t the rhythm is potentially an intrinsic component

of non-verbal interaction that can help identify successful interaction and allow

a robot to learn without any speciﬁc reward.

In order to keep the subjects engage d and to make the robot’s behaviour appear

slightly more life like, we introduced a slow balancing movement on the robot.

Its torso would lean closer to the human partner and then slowly back away

with a low frequency modulated by the rewards obtained over time. Morevover,

in the architecture, a notion of well-be ing has been added in order to control

the expressions of the robot. We therefo re decouple the reward used to learn the

sensorimotor associations a nd the overall behaviour of the ro bot. The robot will

express happiness w he n the well-being is high, and sadness w he n it is low, and

boredom when it is low and the p erceptions stay too stable when the human is

always repeting the same action ove r and over.

Finally, one major change was in the protocol of the exper iment. Regardless of

the background of the subject, they would all hear the same guidelines which

are as follows: You will be asked to use the pink ball to teach the robot to mirror

your actions. The robot is able to hear your voice, but does not understand

words. The robot is able to see your face and what you are expressing. The

robot will only respond to movements (a change of the position of the ball in its

visual ﬁeld). The LEDs in the eyes of the robots will reﬂect the quadrant where

the robot perceives the ball, and are turned oﬀ when it cannot see the ball any

more. Try and act as though you were teaching this to a 6 to 15 mont h old

infant, who is able to process voices and faces. As a monitoring feature, we also

reﬂected the expected rhythm in the LEDs of the robot, as an indicator for the

experimenter to see how the system was performing. The LEDs woud turn to a

brighter white the close r to the predicted action we got, and then fade the longer

after the predicted action. The guideline s were modiﬁed to provide the human

with potential existing and natural rewards (tone of voice and fa cial expressions)

they can use without having to be trained. This was also believed to raise their

conﬁdence in the overall capabilities of the robot.

We conducted a study with 10 subjects, with ages ranging from 2 3 to 60

years old, and with various backgrounds. The robot used was the same w hich

was use d previously, the NAO robot (Fig. 1). The interaction would last typically

ten minutes, and was ended by the exper imenter. We recorded the value of the

rhythm used by the human, the movements perfor med by the robot, and the

rewards the system identiﬁed. Using this we have enough data to know what

Using the interaction rhythm as a natural reinforcement signal for social robots: a matter of belief

Figures

Citations

Multimodal child-robot interaction: building social bonds

Emotional body language displayed by artificial agents

The Neurobehavioral and Social-Emotional Development of Infants and Children

Long-term human-robot interaction with young users

Children's adaptation in multi-session interaction with a humanoid robot

References

Attachment and Loss

Sensitivity and Attachment: A Meta-Analysis on Parental Antecedents of Infant Attachment

Attachment and Loss, Volume I: Attachment

Intrinsic Motivation Systems for Autonomous Mental Development

The neurobehavioral and social-emotional development of infants and children.

Related Papers (5)

Perceptual Social Dimensions of Human - Humanoid Robot Interaction

Toward Robots’ Behavioral Transparency of Temporal Difference Reinforcement Learning With a Human Teacher

Show, attend and interact: Perceivable human-robot social interaction through neural attention Q-network

Designing robot behavior in human robot interaction based on emotion expression

What can I control? A framework for robot self-discovery

Frequently Asked Questions (11)

Q1. What have the authors contributed in "Using the interaction rhythm as a natural reinforcement signal for social robots: a matter of belief" ?

Q2. What future works have the authors mentioned in the paper "Using the interaction rhythm as a natural reinforcement signal for social robots: a matter of belief" ?

Q3. What is the main hypothesis of the study?

Q4. What kind of cues did the subject use to learn?

Q5. What is the reinforcement signal from the rhythm prediction module?

Q6. What is the role of the attachment figure in the development of a child?

Q7. What is the reinforcement signal used to change the weights between two fully connected layers of neurons?

Q8. What is the main idea of developmental robotics?

Q9. What is the problem with the robot?

Q10. Why did the robot forget the correct actions?

Q11. What is the main idea of the experiment?

Trending Questions (1)